[2024-03-29 12:00:56,219][00126] Saving configuration to /workspace/metta/train_dir/b.a20.20x20_40x40.norm/config.json... [2024-03-29 12:00:56,351][00126] Rollout worker 0 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 1 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 2 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 3 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 4 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 5 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 6 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 7 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 8 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 9 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 10 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 11 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 12 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 13 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 14 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 15 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 16 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 17 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 18 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 19 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 20 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 21 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 22 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 23 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 24 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 25 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 26 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 27 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 28 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 29 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 30 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 31 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 32 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 33 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 34 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 35 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 36 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 37 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 38 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 39 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 40 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 41 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 42 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 43 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 44 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 45 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 46 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 47 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 48 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 49 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 50 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 51 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 52 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 53 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 54 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 55 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 56 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 57 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 58 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 59 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 60 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 61 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 62 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 63 uses device cpu [2024-03-29 12:00:58,093][00126] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:00:58,094][00126] InferenceWorker_p0-w0: min num requests: 21 [2024-03-29 12:00:58,197][00126] Starting all processes... [2024-03-29 12:00:58,197][00126] Starting process learner_proc0 [2024-03-29 12:00:58,403][00126] Starting all processes... [2024-03-29 12:00:58,410][00126] Starting process inference_proc0-0 [2024-03-29 12:00:58,410][00126] Starting process rollout_proc1 [2024-03-29 12:00:58,410][00126] Starting process rollout_proc3 [2024-03-29 12:00:58,411][00126] Starting process rollout_proc5 [2024-03-29 12:00:58,413][00126] Starting process rollout_proc7 [2024-03-29 12:00:58,413][00126] Starting process rollout_proc9 [2024-03-29 12:00:58,419][00126] Starting process rollout_proc11 [2024-03-29 12:00:58,421][00126] Starting process rollout_proc0 [2024-03-29 12:00:58,429][00126] Starting process rollout_proc2 [2024-03-29 12:00:58,429][00126] Starting process rollout_proc4 [2024-03-29 12:00:58,429][00126] Starting process rollout_proc13 [2024-03-29 12:00:58,430][00126] Starting process rollout_proc15 [2024-03-29 12:00:58,430][00126] Starting process rollout_proc17 [2024-03-29 12:00:58,430][00126] Starting process rollout_proc19 [2024-03-29 12:00:58,432][00126] Starting process rollout_proc21 [2024-03-29 12:00:58,438][00126] Starting process rollout_proc23 [2024-03-29 12:00:58,451][00126] Starting process rollout_proc6 [2024-03-29 12:00:58,452][00126] Starting process rollout_proc10 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc25 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc27 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc8 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc29 [2024-03-29 12:00:58,498][00126] Starting process rollout_proc31 [2024-03-29 12:00:58,535][00126] Starting process rollout_proc33 [2024-03-29 12:00:58,549][00126] Starting process rollout_proc35 [2024-03-29 12:00:58,549][00126] Starting process rollout_proc12 [2024-03-29 12:00:58,550][00126] Starting process rollout_proc14 [2024-03-29 12:00:58,556][00126] Starting process rollout_proc16 [2024-03-29 12:00:58,564][00126] Starting process rollout_proc18 [2024-03-29 12:00:58,577][00126] Starting process rollout_proc20 [2024-03-29 12:00:58,584][00126] Starting process rollout_proc22 [2024-03-29 12:00:58,591][00126] Starting process rollout_proc24 [2024-03-29 12:00:58,599][00126] Starting process rollout_proc37 [2024-03-29 12:00:58,617][00126] Starting process rollout_proc39 [2024-03-29 12:00:58,617][00126] Starting process rollout_proc26 [2024-03-29 12:00:58,672][00126] Starting process rollout_proc41 [2024-03-29 12:00:58,693][00126] Starting process rollout_proc28 [2024-03-29 12:00:58,693][00126] Starting process rollout_proc30 [2024-03-29 12:00:58,704][00126] Starting process rollout_proc32 [2024-03-29 12:00:58,735][00126] Starting process rollout_proc34 [2024-03-29 12:00:58,730][00126] Starting process rollout_proc43 [2024-03-29 12:00:58,730][00126] Starting process rollout_proc36 [2024-03-29 12:00:58,807][00126] Starting process rollout_proc45 [2024-03-29 12:00:58,807][00126] Starting process rollout_proc47 [2024-03-29 12:00:58,809][00126] Starting process rollout_proc49 [2024-03-29 12:00:58,873][00126] Starting process rollout_proc46 [2024-03-29 12:00:58,885][00126] Starting process rollout_proc51 [2024-03-29 12:00:58,897][00126] Starting process rollout_proc38 [2024-03-29 12:00:58,900][00126] Starting process rollout_proc40 [2024-03-29 12:00:58,906][00126] Starting process rollout_proc53 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc55 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc57 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc48 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc54 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc42 [2024-03-29 12:00:59,097][00126] Starting process rollout_proc59 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc52 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc50 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc61 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc63 [2024-03-29 12:00:59,257][00126] Starting process rollout_proc58 [2024-03-29 12:00:59,257][00126] Starting process rollout_proc56 [2024-03-29 12:00:59,262][00126] Starting process rollout_proc44 [2024-03-29 12:00:59,344][00126] Starting process rollout_proc60 [2024-03-29 12:00:59,379][00126] Starting process rollout_proc62 [2024-03-29 12:01:03,344][00503] Worker 3 uses CPU cores [3] [2024-03-29 12:01:03,364][00481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:03,364][00481] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-03-29 12:01:03,389][00481] Num visible devices: 1 [2024-03-29 12:01:03,441][00502] Worker 1 uses CPU cores [1] [2024-03-29 12:01:03,448][00481] Starting seed is not provided [2024-03-29 12:01:03,452][00481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:03,452][00481] Initializing actor-critic model on device cuda:0 [2024-03-29 12:01:03,453][00481] RunningMeanStd input shape: (20,) [2024-03-29 12:01:03,459][00481] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:01:03,459][00481] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:01:03,459][00481] RunningMeanStd input shape: (2,) [2024-03-29 12:01:03,460][00481] RunningMeanStd input shape: (1,) [2024-03-29 12:01:03,460][00481] RunningMeanStd input shape: (1,) [2024-03-29 12:01:03,460][00501] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:03,461][00501] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-03-29 12:01:03,461][00505] Worker 9 uses CPU cores [9] [2024-03-29 12:01:03,478][00501] Num visible devices: 1 [2024-03-29 12:01:03,497][00888] Worker 4 uses CPU cores [4] [2024-03-29 12:01:03,511][00889] Worker 13 uses CPU cores [13] [2024-03-29 12:01:03,531][00891] Worker 17 uses CPU cores [17] [2024-03-29 12:01:03,531][01024] Worker 27 uses CPU cores [27] [2024-03-29 12:01:03,541][00887] Worker 2 uses CPU cores [2] [2024-03-29 12:01:03,585][00569] Worker 7 uses CPU cores [7] [2024-03-29 12:01:03,609][00589] Worker 11 uses CPU cores [11] [2024-03-29 12:01:03,609][01471] Worker 12 uses CPU cores [12] [2024-03-29 12:01:03,609][00890] Worker 15 uses CPU cores [15] [2024-03-29 12:01:03,624][00997] Worker 23 uses CPU cores [23] [2024-03-29 12:01:03,639][01004] Worker 25 uses CPU cores [25] [2024-03-29 12:01:03,641][01025] Worker 8 uses CPU cores [8] [2024-03-29 12:01:03,669][01023] Worker 10 uses CPU cores [10] [2024-03-29 12:01:03,677][00892] Worker 19 uses CPU cores [19] [2024-03-29 12:01:03,677][00996] Worker 21 uses CPU cores [21] [2024-03-29 12:01:03,681][01856] Worker 39 uses CPU cores [39] [2024-03-29 12:01:03,681][01003] Worker 6 uses CPU cores [6] [2024-03-29 12:01:03,693][01279] Worker 31 uses CPU cores [31] [2024-03-29 12:01:03,697][02370] Worker 51 uses CPU cores [51] [2024-03-29 12:01:03,711][01790] Worker 45 uses CPU cores [45] [2024-03-29 12:01:03,711][01662] Worker 16 uses CPU cores [16] [2024-03-29 12:01:03,713][01855] Worker 37 uses CPU cores [37] [2024-03-29 12:01:03,713][02358] Worker 28 uses CPU cores [28] [2024-03-29 12:01:03,766][01535] Worker 14 uses CPU cores [14] [2024-03-29 12:01:03,769][02356] Worker 49 uses CPU cores [49] [2024-03-29 12:01:03,776][03302] Worker 59 uses CPU cores [59] [2024-03-29 12:01:03,785][02239] Worker 46 uses CPU cores [46] [2024-03-29 12:01:03,786][00504] Worker 5 uses CPU cores [5] [2024-03-29 12:01:03,789][02053] Worker 41 uses CPU cores [41] [2024-03-29 12:01:03,796][02371] Worker 47 uses CPU cores [47] [2024-03-29 12:01:03,802][03373] Worker 52 uses CPU cores [52] [2024-03-29 12:01:03,803][02500] Worker 55 uses CPU cores [55] [2024-03-29 12:01:03,813][03572] Worker 61 uses CPU cores [61] [2024-03-29 12:01:03,813][01854] Worker 18 uses CPU cores [18] [2024-03-29 12:01:03,821][02357] Worker 22 uses CPU cores [22] [2024-03-29 12:01:03,830][01789] Worker 20 uses CPU cores [20] [2024-03-29 12:01:03,830][02111] Worker 53 uses CPU cores [53] [2024-03-29 12:01:03,830][03070] Worker 54 uses CPU cores [54] [2024-03-29 12:01:03,836][01920] Worker 26 uses CPU cores [26] [2024-03-29 12:01:03,837][02369] Worker 36 uses CPU cores [36] [2024-03-29 12:01:03,851][01294] Worker 35 uses CPU cores [35] [2024-03-29 12:01:03,851][02754] Worker 40 uses CPU cores [40] [2024-03-29 12:01:03,859][02884] Worker 43 uses CPU cores [43] [2024-03-29 12:01:03,861][03578] Worker 63 uses CPU cores [63] [2024-03-29 12:01:03,865][02436] Worker 38 uses CPU cores [38] [2024-03-29 12:01:03,865][02372] Worker 30 uses CPU cores [30] [2024-03-29 12:01:03,879][02883] Worker 48 uses CPU cores [48] [2024-03-29 12:01:03,902][02819] Worker 57 uses CPU cores [57] [2024-03-29 12:01:03,905][01259] Worker 29 uses CPU cores [29] [2024-03-29 12:01:03,926][00570] Worker 0 uses CPU cores [0] [2024-03-29 12:01:03,948][03258] Worker 42 uses CPU cores [42] [2024-03-29 12:01:03,968][02818] Worker 32 uses CPU cores [32] [2024-03-29 12:01:03,968][03713] Worker 44 uses CPU cores [44] [2024-03-29 12:01:03,993][03585] Worker 58 uses CPU cores [58] [2024-03-29 12:01:03,993][03445] Worker 50 uses CPU cores [50] [2024-03-29 12:01:04,001][02188] Worker 24 uses CPU cores [24] [2024-03-29 12:01:04,037][01280] Worker 33 uses CPU cores [33] [2024-03-29 12:01:04,068][02690] Worker 34 uses CPU cores [34] [2024-03-29 12:01:04,068][03725] Worker 60 uses CPU cores [60] [2024-03-29 12:01:04,091][03841] Worker 62 uses CPU cores [62] [2024-03-29 12:01:04,110][03586] Worker 56 uses CPU cores [56] [2024-03-29 12:01:04,204][00481] Created Actor Critic model with architecture: [2024-03-29 12:01:04,204][00481] PredictingActorCritic( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (global_vars): RunningMeanStdInPlace() (griddly_obs): RunningMeanStdInPlace() (kinship): RunningMeanStdInPlace() (last_action): RunningMeanStdInPlace() (last_reward): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): ObjectEmeddingAgentEncoder( (object_embedding): Sequential( (0): Linear(in_features=52, out_features=64, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) ) (encoder_head): Sequential( (0): Linear(in_features=7767, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): ObjectEmeddingAgentDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=17, bias=True) ) ) [2024-03-29 12:01:05,002][00481] Using optimizer [2024-03-29 12:01:05,527][00481] No checkpoints found [2024-03-29 12:01:05,527][00481] Did not load from checkpoint, starting from scratch! [2024-03-29 12:01:05,527][00481] Initialized policy 0 weights for model version 0 [2024-03-29 12:01:05,529][00481] LearnerWorker_p0 finished initialization! [2024-03-29 12:01:05,530][00481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:05,716][00501] RunningMeanStd input shape: (20,) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (2,) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (1,) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (1,) [2024-03-29 12:01:06,339][00126] Inference worker 0-0 is ready! [2024-03-29 12:01:06,340][00126] All inference workers are ready! Signal rollout workers to start! [2024-03-29 12:01:06,686][00126] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:07,274][01535] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,275][00891] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,279][02436] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,285][00589] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,285][02754] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,294][00505] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,298][00888] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,299][02372] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,301][02370] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,303][01280] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,311][03585] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,314][02884] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,314][02690] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,314][03578] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,327][00502] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,329][00997] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,330][03445] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,331][00892] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,332][01279] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,332][00889] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,333][02369] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,335][02371] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,340][00504] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,350][00503] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,351][03586] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,353][01023] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,357][02111] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,359][02883] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,362][01855] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,365][00890] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,368][02239] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,369][00996] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,370][01471] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,371][00569] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,373][02500] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,376][03070] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,378][02819] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,379][02357] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,394][02356] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,394][01920] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,394][00570] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,395][01790] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,396][01789] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,400][02818] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,402][01294] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,402][01004] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,402][03258] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,407][03713] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,408][03302] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,419][01856] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,420][03373] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,421][03725] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,422][01025] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,423][01024] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,423][02053] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,426][03841] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,428][01003] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,431][00887] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,434][03572] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,436][02188] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,437][02358] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,447][01662] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,448][01854] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,449][01259] Decorrelating experience for 0 frames... [2024-03-29 12:01:08,203][01535] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,209][03445] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,211][00888] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,216][01280] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,222][00892] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,228][02754] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,231][02690] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,231][01279] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,233][00891] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,234][00997] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,235][02370] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,238][00502] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,238][02371] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,242][02369] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,243][02372] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,248][00503] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,255][00589] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,256][01023] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,262][00505] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,263][00889] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,266][00504] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,267][02436] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,267][00890] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,274][00569] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,276][02884] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,278][03585] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,279][02500] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,281][02357] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,290][03586] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,290][03578] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,296][01003] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,302][02819] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,306][02356] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,314][01920] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,316][02883] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,323][02053] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,325][02239] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,332][03070] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,336][01855] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,337][01856] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,338][01790] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,340][03258] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,344][03373] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,344][00996] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,353][02111] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,354][00570] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,359][03725] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,359][02818] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,365][03841] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,369][01789] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,371][01854] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,372][01294] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,377][01025] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,384][01004] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,385][00887] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,385][01259] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,387][02188] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,387][03572] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,390][01024] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,391][01662] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,398][03713] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,400][01471] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,400][03302] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,447][02358] Decorrelating experience for 256 frames... [2024-03-29 12:01:11,685][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:16,685][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 34048.4. Samples: 340480. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:18,090][00126] Heartbeat connected on Batcher_0 [2024-03-29 12:01:18,091][00126] Heartbeat connected on LearnerWorker_p0 [2024-03-29 12:01:18,098][00126] Heartbeat connected on RolloutWorker_w1 [2024-03-29 12:01:18,099][00126] Heartbeat connected on RolloutWorker_w2 [2024-03-29 12:01:18,100][00126] Heartbeat connected on RolloutWorker_w0 [2024-03-29 12:01:18,101][00126] Heartbeat connected on RolloutWorker_w3 [2024-03-29 12:01:18,102][00126] Heartbeat connected on RolloutWorker_w4 [2024-03-29 12:01:18,110][00126] Heartbeat connected on RolloutWorker_w9 [2024-03-29 12:01:18,111][00126] Heartbeat connected on RolloutWorker_w5 [2024-03-29 12:01:18,112][00126] Heartbeat connected on RolloutWorker_w10 [2024-03-29 12:01:18,112][00126] Heartbeat connected on RolloutWorker_w7 [2024-03-29 12:01:18,113][00126] Heartbeat connected on RolloutWorker_w6 [2024-03-29 12:01:18,113][00126] Heartbeat connected on RolloutWorker_w11 [2024-03-29 12:01:18,115][00126] Heartbeat connected on RolloutWorker_w12 [2024-03-29 12:01:18,118][00126] Heartbeat connected on RolloutWorker_w13 [2024-03-29 12:01:18,118][00126] Heartbeat connected on RolloutWorker_w14 [2024-03-29 12:01:18,120][00126] Heartbeat connected on RolloutWorker_w15 [2024-03-29 12:01:18,121][00126] Heartbeat connected on RolloutWorker_w8 [2024-03-29 12:01:18,121][00126] Heartbeat connected on RolloutWorker_w16 [2024-03-29 12:01:18,123][00126] Heartbeat connected on RolloutWorker_w17 [2024-03-29 12:01:18,124][00126] Heartbeat connected on RolloutWorker_w18 [2024-03-29 12:01:18,125][00126] Heartbeat connected on InferenceWorker_p0-w0 [2024-03-29 12:01:18,127][00126] Heartbeat connected on RolloutWorker_w20 [2024-03-29 12:01:18,130][00126] Heartbeat connected on RolloutWorker_w22 [2024-03-29 12:01:18,132][00126] Heartbeat connected on RolloutWorker_w23 [2024-03-29 12:01:18,132][00126] Heartbeat connected on RolloutWorker_w19 [2024-03-29 12:01:18,133][00126] Heartbeat connected on RolloutWorker_w24 [2024-03-29 12:01:18,135][00126] Heartbeat connected on RolloutWorker_w25 [2024-03-29 12:01:18,136][00126] Heartbeat connected on RolloutWorker_w26 [2024-03-29 12:01:18,137][00126] Heartbeat connected on RolloutWorker_w21 [2024-03-29 12:01:18,138][00126] Heartbeat connected on RolloutWorker_w27 [2024-03-29 12:01:18,140][00126] Heartbeat connected on RolloutWorker_w28 [2024-03-29 12:01:18,142][00126] Heartbeat connected on RolloutWorker_w29 [2024-03-29 12:01:18,143][00126] Heartbeat connected on RolloutWorker_w30 [2024-03-29 12:01:18,145][00126] Heartbeat connected on RolloutWorker_w31 [2024-03-29 12:01:18,146][00126] Heartbeat connected on RolloutWorker_w32 [2024-03-29 12:01:18,153][00126] Heartbeat connected on RolloutWorker_w36 [2024-03-29 12:01:18,153][00126] Heartbeat connected on RolloutWorker_w35 [2024-03-29 12:01:18,155][00126] Heartbeat connected on RolloutWorker_w37 [2024-03-29 12:01:18,155][00126] Heartbeat connected on RolloutWorker_w34 [2024-03-29 12:01:18,155][00126] Heartbeat connected on RolloutWorker_w38 [2024-03-29 12:01:18,156][00126] Heartbeat connected on RolloutWorker_w33 [2024-03-29 12:01:18,157][00126] Heartbeat connected on RolloutWorker_w39 [2024-03-29 12:01:18,160][00126] Heartbeat connected on RolloutWorker_w41 [2024-03-29 12:01:18,162][00126] Heartbeat connected on RolloutWorker_w42 [2024-03-29 12:01:18,162][00126] Heartbeat connected on RolloutWorker_w40 [2024-03-29 12:01:18,163][00126] Heartbeat connected on RolloutWorker_w43 [2024-03-29 12:01:18,167][00126] Heartbeat connected on RolloutWorker_w45 [2024-03-29 12:01:18,172][00126] Heartbeat connected on RolloutWorker_w48 [2024-03-29 12:01:18,172][00126] Heartbeat connected on RolloutWorker_w46 [2024-03-29 12:01:18,173][00126] Heartbeat connected on RolloutWorker_w47 [2024-03-29 12:01:18,173][00126] Heartbeat connected on RolloutWorker_w49 [2024-03-29 12:01:18,173][00126] Heartbeat connected on RolloutWorker_w44 [2024-03-29 12:01:18,174][00126] Heartbeat connected on RolloutWorker_w50 [2024-03-29 12:01:18,176][00126] Heartbeat connected on RolloutWorker_w51 [2024-03-29 12:01:18,179][00126] Heartbeat connected on RolloutWorker_w53 [2024-03-29 12:01:18,179][00126] Heartbeat connected on RolloutWorker_w52 [2024-03-29 12:01:18,180][00126] Heartbeat connected on RolloutWorker_w54 [2024-03-29 12:01:18,182][00126] Heartbeat connected on RolloutWorker_w55 [2024-03-29 12:01:18,183][00126] Heartbeat connected on RolloutWorker_w56 [2024-03-29 12:01:18,185][00126] Heartbeat connected on RolloutWorker_w57 [2024-03-29 12:01:18,186][00126] Heartbeat connected on RolloutWorker_w58 [2024-03-29 12:01:18,194][00126] Heartbeat connected on RolloutWorker_w60 [2024-03-29 12:01:18,195][00126] Heartbeat connected on RolloutWorker_w59 [2024-03-29 12:01:18,195][00126] Heartbeat connected on RolloutWorker_w62 [2024-03-29 12:01:18,195][00126] Heartbeat connected on RolloutWorker_w61 [2024-03-29 12:01:18,196][00126] Heartbeat connected on RolloutWorker_w63 [2024-03-29 12:01:20,329][03258] Worker 42, sleep for 98.438 sec to decorrelate experience collection [2024-03-29 12:01:20,330][02883] Worker 48, sleep for 112.500 sec to decorrelate experience collection [2024-03-29 12:01:20,330][00504] Worker 5, sleep for 11.719 sec to decorrelate experience collection [2024-03-29 12:01:20,353][03302] Worker 59, sleep for 138.281 sec to decorrelate experience collection [2024-03-29 12:01:20,353][00997] Worker 23, sleep for 53.906 sec to decorrelate experience collection [2024-03-29 12:01:20,354][00888] Worker 4, sleep for 9.375 sec to decorrelate experience collection [2024-03-29 12:01:20,355][02370] Worker 51, sleep for 119.531 sec to decorrelate experience collection [2024-03-29 12:01:20,355][03445] Worker 50, sleep for 117.188 sec to decorrelate experience collection [2024-03-29 12:01:20,356][01535] Worker 14, sleep for 32.812 sec to decorrelate experience collection [2024-03-29 12:01:20,376][01024] Worker 27, sleep for 63.281 sec to decorrelate experience collection [2024-03-29 12:01:20,377][02369] Worker 36, sleep for 84.375 sec to decorrelate experience collection [2024-03-29 12:01:20,378][03586] Worker 56, sleep for 131.250 sec to decorrelate experience collection [2024-03-29 12:01:20,379][02371] Worker 47, sleep for 110.156 sec to decorrelate experience collection [2024-03-29 12:01:20,380][02884] Worker 43, sleep for 100.781 sec to decorrelate experience collection [2024-03-29 12:01:20,381][02818] Worker 32, sleep for 75.000 sec to decorrelate experience collection [2024-03-29 12:01:20,386][01023] Worker 10, sleep for 23.438 sec to decorrelate experience collection [2024-03-29 12:01:20,386][01279] Worker 31, sleep for 72.656 sec to decorrelate experience collection [2024-03-29 12:01:20,392][01855] Worker 37, sleep for 86.719 sec to decorrelate experience collection [2024-03-29 12:01:20,395][00505] Worker 9, sleep for 21.094 sec to decorrelate experience collection [2024-03-29 12:01:20,395][01920] Worker 26, sleep for 60.938 sec to decorrelate experience collection [2024-03-29 12:01:20,395][02053] Worker 41, sleep for 96.094 sec to decorrelate experience collection [2024-03-29 12:01:20,396][02690] Worker 34, sleep for 79.688 sec to decorrelate experience collection [2024-03-29 12:01:20,397][02357] Worker 22, sleep for 51.562 sec to decorrelate experience collection [2024-03-29 12:01:20,410][03725] Worker 60, sleep for 140.625 sec to decorrelate experience collection [2024-03-29 12:01:20,423][00503] Worker 3, sleep for 7.031 sec to decorrelate experience collection [2024-03-29 12:01:20,428][01790] Worker 45, sleep for 105.469 sec to decorrelate experience collection [2024-03-29 12:01:20,433][02372] Worker 30, sleep for 70.312 sec to decorrelate experience collection [2024-03-29 12:01:20,467][00996] Worker 21, sleep for 49.219 sec to decorrelate experience collection [2024-03-29 12:01:20,467][03373] Worker 52, sleep for 121.875 sec to decorrelate experience collection [2024-03-29 12:01:20,467][00891] Worker 17, sleep for 39.844 sec to decorrelate experience collection [2024-03-29 12:01:20,469][01025] Worker 8, sleep for 18.750 sec to decorrelate experience collection [2024-03-29 12:01:20,475][00890] Worker 15, sleep for 35.156 sec to decorrelate experience collection [2024-03-29 12:01:20,476][02356] Worker 49, sleep for 114.844 sec to decorrelate experience collection [2024-03-29 12:01:20,478][00569] Worker 7, sleep for 16.406 sec to decorrelate experience collection [2024-03-29 12:01:20,479][01003] Worker 6, sleep for 14.062 sec to decorrelate experience collection [2024-03-29 12:01:20,495][01294] Worker 35, sleep for 82.031 sec to decorrelate experience collection [2024-03-29 12:01:20,497][03070] Worker 54, sleep for 126.562 sec to decorrelate experience collection [2024-03-29 12:01:20,497][01662] Worker 16, sleep for 37.500 sec to decorrelate experience collection [2024-03-29 12:01:20,510][02500] Worker 55, sleep for 128.906 sec to decorrelate experience collection [2024-03-29 12:01:20,511][00889] Worker 13, sleep for 30.469 sec to decorrelate experience collection [2024-03-29 12:01:20,511][02239] Worker 46, sleep for 107.812 sec to decorrelate experience collection [2024-03-29 12:01:20,512][03572] Worker 61, sleep for 142.969 sec to decorrelate experience collection [2024-03-29 12:01:20,512][02754] Worker 40, sleep for 93.750 sec to decorrelate experience collection [2024-03-29 12:01:20,513][03585] Worker 58, sleep for 135.938 sec to decorrelate experience collection [2024-03-29 12:01:20,515][01004] Worker 25, sleep for 58.594 sec to decorrelate experience collection [2024-03-29 12:01:20,519][01856] Worker 39, sleep for 91.406 sec to decorrelate experience collection [2024-03-29 12:01:20,520][00887] Worker 2, sleep for 4.688 sec to decorrelate experience collection [2024-03-29 12:01:20,521][00481] Signal inference workers to stop experience collection... [2024-03-29 12:01:20,530][02188] Worker 24, sleep for 56.250 sec to decorrelate experience collection [2024-03-29 12:01:20,531][02819] Worker 57, sleep for 133.594 sec to decorrelate experience collection [2024-03-29 12:01:20,532][00589] Worker 11, sleep for 25.781 sec to decorrelate experience collection [2024-03-29 12:01:20,536][02436] Worker 38, sleep for 89.062 sec to decorrelate experience collection [2024-03-29 12:01:20,536][01789] Worker 20, sleep for 46.875 sec to decorrelate experience collection [2024-03-29 12:01:20,538][00501] InferenceWorker_p0-w0: stopping experience collection [2024-03-29 12:01:20,543][00892] Worker 19, sleep for 44.531 sec to decorrelate experience collection [2024-03-29 12:01:20,544][02111] Worker 53, sleep for 124.219 sec to decorrelate experience collection [2024-03-29 12:01:20,565][03841] Worker 62, sleep for 145.312 sec to decorrelate experience collection [2024-03-29 12:01:20,569][01471] Worker 12, sleep for 28.125 sec to decorrelate experience collection [2024-03-29 12:01:20,570][02358] Worker 28, sleep for 65.625 sec to decorrelate experience collection [2024-03-29 12:01:21,685][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 43623.2. Samples: 654340. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:22,273][00481] Signal inference workers to resume experience collection... [2024-03-29 12:01:22,273][00501] InferenceWorker_p0-w0: resuming experience collection [2024-03-29 12:01:22,346][03578] Worker 63, sleep for 147.656 sec to decorrelate experience collection [2024-03-29 12:01:22,348][03713] Worker 44, sleep for 103.125 sec to decorrelate experience collection [2024-03-29 12:01:22,381][01259] Worker 29, sleep for 67.969 sec to decorrelate experience collection [2024-03-29 12:01:22,854][01280] Worker 33, sleep for 77.344 sec to decorrelate experience collection [2024-03-29 12:01:22,886][01854] Worker 18, sleep for 42.188 sec to decorrelate experience collection [2024-03-29 12:01:22,886][00502] Worker 1, sleep for 2.344 sec to decorrelate experience collection [2024-03-29 12:01:24,647][00501] Updated weights for policy 0, policy_version 10 (0.0013) [2024-03-29 12:01:25,232][00887] Worker 2 awakens! [2024-03-29 12:01:25,232][00502] Worker 1 awakens! [2024-03-29 12:01:26,685][00126] Fps is (10 sec: 31130.0, 60 sec: 15565.0, 300 sec: 15565.0). Total num frames: 311296. Throughput: 0: 32818.4. Samples: 656360. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2024-03-29 12:01:26,915][00501] Updated weights for policy 0, policy_version 20 (0.0013) [2024-03-29 12:01:27,490][00503] Worker 3 awakens! [2024-03-29 12:01:29,776][00888] Worker 4 awakens! [2024-03-29 12:01:31,685][00126] Fps is (10 sec: 34406.2, 60 sec: 13762.6, 300 sec: 13762.6). Total num frames: 344064. Throughput: 0: 27087.4. Samples: 677180. Policy #0 lag: (min: 16.0, avg: 19.0, max: 19.0) [2024-03-29 12:01:32,067][00504] Worker 5 awakens! [2024-03-29 12:01:34,571][01003] Worker 6 awakens! [2024-03-29 12:01:36,685][00126] Fps is (10 sec: 8192.0, 60 sec: 13107.3, 300 sec: 13107.3). Total num frames: 393216. Throughput: 0: 24046.8. Samples: 721400. Policy #0 lag: (min: 0.0, avg: 18.7, max: 21.0) [2024-03-29 12:01:36,896][00569] Worker 7 awakens! [2024-03-29 12:01:39,313][01025] Worker 8 awakens! [2024-03-29 12:01:41,552][00505] Worker 9 awakens! [2024-03-29 12:01:41,685][00126] Fps is (10 sec: 8192.0, 60 sec: 12171.0, 300 sec: 12171.0). Total num frames: 425984. Throughput: 0: 21410.4. Samples: 749360. Policy #0 lag: (min: 0.0, avg: 9.5, max: 25.0) [2024-03-29 12:01:41,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:01:41,687][00481] Saving new best policy, reward=0.000! [2024-03-29 12:01:43,924][01023] Worker 10 awakens! [2024-03-29 12:01:44,897][00501] Updated weights for policy 0, policy_version 30 (0.0012) [2024-03-29 12:01:46,418][00589] Worker 11 awakens! [2024-03-29 12:01:46,685][00126] Fps is (10 sec: 11468.6, 60 sec: 12697.6, 300 sec: 12697.6). Total num frames: 507904. Throughput: 0: 20674.5. Samples: 826980. Policy #0 lag: (min: 0.0, avg: 3.7, max: 6.0) [2024-03-29 12:01:46,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:01:48,794][01471] Worker 12 awakens! [2024-03-29 12:01:51,080][00889] Worker 13 awakens! [2024-03-29 12:01:51,685][00126] Fps is (10 sec: 19660.8, 60 sec: 13835.4, 300 sec: 13835.4). Total num frames: 622592. Throughput: 0: 21130.3. Samples: 950860. Policy #0 lag: (min: 0.0, avg: 14.7, max: 37.0) [2024-03-29 12:01:51,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:01:52,866][00501] Updated weights for policy 0, policy_version 40 (0.0015) [2024-03-29 12:01:53,269][01535] Worker 14 awakens! [2024-03-29 12:01:55,731][00890] Worker 15 awakens! [2024-03-29 12:01:56,685][00126] Fps is (10 sec: 22937.7, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 737280. Throughput: 0: 22794.6. Samples: 1025760. Policy #0 lag: (min: 0.0, avg: 17.1, max: 44.0) [2024-03-29 12:01:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:01:58,025][01662] Worker 16 awakens! [2024-03-29 12:01:59,304][00501] Updated weights for policy 0, policy_version 50 (0.0017) [2024-03-29 12:02:00,412][00891] Worker 17 awakens! [2024-03-29 12:02:01,685][00126] Fps is (10 sec: 24576.2, 60 sec: 15788.3, 300 sec: 15788.3). Total num frames: 868352. Throughput: 0: 18732.0. Samples: 1183420. Policy #0 lag: (min: 0.0, avg: 19.8, max: 52.0) [2024-03-29 12:02:01,688][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:05,173][01854] Worker 18 awakens! [2024-03-29 12:02:05,175][00892] Worker 19 awakens! [2024-03-29 12:02:05,244][00501] Updated weights for policy 0, policy_version 60 (0.0018) [2024-03-29 12:02:06,685][00126] Fps is (10 sec: 29491.6, 60 sec: 17203.3, 300 sec: 17203.3). Total num frames: 1032192. Throughput: 0: 15262.2. Samples: 1341140. Policy #0 lag: (min: 2.0, avg: 26.3, max: 60.0) [2024-03-29 12:02:06,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:07,448][01789] Worker 20 awakens! [2024-03-29 12:02:09,698][00996] Worker 21 awakens! [2024-03-29 12:02:11,065][00501] Updated weights for policy 0, policy_version 70 (0.0019) [2024-03-29 12:02:11,685][00126] Fps is (10 sec: 29491.3, 60 sec: 19387.7, 300 sec: 17896.4). Total num frames: 1163264. Throughput: 0: 17453.3. Samples: 1441760. Policy #0 lag: (min: 1.0, avg: 5.2, max: 13.0) [2024-03-29 12:02:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:11,703][00481] Saving new best policy, reward=0.001! [2024-03-29 12:02:11,962][02357] Worker 22 awakens! [2024-03-29 12:02:14,298][00997] Worker 23 awakens! [2024-03-29 12:02:14,983][00501] Updated weights for policy 0, policy_version 80 (0.0015) [2024-03-29 12:02:16,685][00126] Fps is (10 sec: 31129.3, 60 sec: 22391.5, 300 sec: 19192.7). Total num frames: 1343488. Throughput: 0: 21643.5. Samples: 1651140. Policy #0 lag: (min: 1.0, avg: 9.0, max: 15.0) [2024-03-29 12:02:16,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:16,880][02188] Worker 24 awakens! [2024-03-29 12:02:19,056][00501] Updated weights for policy 0, policy_version 90 (0.0019) [2024-03-29 12:02:19,209][01004] Worker 25 awakens! [2024-03-29 12:02:21,433][01920] Worker 26 awakens! [2024-03-29 12:02:21,685][00126] Fps is (10 sec: 40959.2, 60 sec: 26214.3, 300 sec: 20971.5). Total num frames: 1572864. Throughput: 0: 25924.3. Samples: 1888000. Policy #0 lag: (min: 0.0, avg: 35.6, max: 92.0) [2024-03-29 12:02:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:23,618][00501] Updated weights for policy 0, policy_version 100 (0.0015) [2024-03-29 12:02:23,761][01024] Worker 27 awakens! [2024-03-29 12:02:26,160][00481] Signal inference workers to stop experience collection... (50 times) [2024-03-29 12:02:26,206][00501] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-03-29 12:02:26,239][00481] Signal inference workers to resume experience collection... (50 times) [2024-03-29 12:02:26,239][00501] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-03-29 12:02:26,297][02358] Worker 28 awakens! [2024-03-29 12:02:26,685][00126] Fps is (10 sec: 44237.3, 60 sec: 24576.0, 300 sec: 22323.3). Total num frames: 1785856. Throughput: 0: 28437.0. Samples: 2029020. Policy #0 lag: (min: 1.0, avg: 8.4, max: 18.0) [2024-03-29 12:02:26,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:26,848][00501] Updated weights for policy 0, policy_version 110 (0.0020) [2024-03-29 12:02:30,398][01259] Worker 29 awakens! [2024-03-29 12:02:30,798][02372] Worker 30 awakens! [2024-03-29 12:02:31,685][00126] Fps is (10 sec: 37683.4, 60 sec: 26760.5, 300 sec: 22937.6). Total num frames: 1949696. Throughput: 0: 31918.2. Samples: 2263300. Policy #0 lag: (min: 0.0, avg: 7.2, max: 18.0) [2024-03-29 12:02:31,688][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:31,801][00501] Updated weights for policy 0, policy_version 120 (0.0031) [2024-03-29 12:02:33,145][01279] Worker 31 awakens! [2024-03-29 12:02:35,480][02818] Worker 32 awakens! [2024-03-29 12:02:35,482][00501] Updated weights for policy 0, policy_version 130 (0.0025) [2024-03-29 12:02:36,685][00126] Fps is (10 sec: 37682.7, 60 sec: 29491.1, 300 sec: 24029.9). Total num frames: 2162688. Throughput: 0: 33888.9. Samples: 2475860. Policy #0 lag: (min: 0.0, avg: 12.5, max: 21.0) [2024-03-29 12:02:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:40,094][02690] Worker 34 awakens! [2024-03-29 12:02:40,235][01280] Worker 33 awakens! [2024-03-29 12:02:40,243][00501] Updated weights for policy 0, policy_version 140 (0.0021) [2024-03-29 12:02:40,635][00481] self.policy_id=0 batch has 62.50% of invalid samples [2024-03-29 12:02:41,685][00126] Fps is (10 sec: 39322.0, 60 sec: 31948.8, 300 sec: 24662.3). Total num frames: 2342912. Throughput: 0: 35361.4. Samples: 2617020. Policy #0 lag: (min: 0.0, avg: 91.4, max: 140.0) [2024-03-29 12:02:41,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:42,626][01294] Worker 35 awakens! [2024-03-29 12:02:43,555][00501] Updated weights for policy 0, policy_version 150 (0.0020) [2024-03-29 12:02:44,852][02369] Worker 36 awakens! [2024-03-29 12:02:46,685][00126] Fps is (10 sec: 37683.0, 60 sec: 33860.3, 300 sec: 25395.2). Total num frames: 2539520. Throughput: 0: 36487.5. Samples: 2825360. Policy #0 lag: (min: 0.0, avg: 13.9, max: 24.0) [2024-03-29 12:02:46,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:47,147][01855] Worker 37 awakens! [2024-03-29 12:02:49,074][00501] Updated weights for policy 0, policy_version 160 (0.0020) [2024-03-29 12:02:49,606][02436] Worker 38 awakens! [2024-03-29 12:02:51,686][00126] Fps is (10 sec: 40959.2, 60 sec: 35498.6, 300 sec: 26214.4). Total num frames: 2752512. Throughput: 0: 38502.9. Samples: 3073780. Policy #0 lag: (min: 0.0, avg: 20.8, max: 164.0) [2024-03-29 12:02:51,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:51,885][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000169_2768896.pth... [2024-03-29 12:02:52,013][01856] Worker 39 awakens! [2024-03-29 12:02:52,556][00501] Updated weights for policy 0, policy_version 170 (0.0019) [2024-03-29 12:02:54,280][02754] Worker 40 awakens! [2024-03-29 12:02:56,513][02053] Worker 41 awakens! [2024-03-29 12:02:56,686][00126] Fps is (10 sec: 39321.5, 60 sec: 36590.9, 300 sec: 26661.2). Total num frames: 2932736. Throughput: 0: 38587.4. Samples: 3178200. Policy #0 lag: (min: 0.0, avg: 15.6, max: 26.0) [2024-03-29 12:02:56,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:56,829][00501] Updated weights for policy 0, policy_version 180 (0.0018) [2024-03-29 12:02:58,867][03258] Worker 42 awakens! [2024-03-29 12:03:01,094][00501] Updated weights for policy 0, policy_version 190 (0.0020) [2024-03-29 12:03:01,261][02884] Worker 43 awakens! [2024-03-29 12:03:01,685][00126] Fps is (10 sec: 39322.6, 60 sec: 37956.3, 300 sec: 27354.2). Total num frames: 3145728. Throughput: 0: 39771.6. Samples: 3440860. Policy #0 lag: (min: 0.0, avg: 13.4, max: 28.0) [2024-03-29 12:03:01,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:02,764][00481] Signal inference workers to stop experience collection... (100 times) [2024-03-29 12:03:02,781][00501] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-03-29 12:03:02,972][00481] Signal inference workers to resume experience collection... (100 times) [2024-03-29 12:03:02,972][00501] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-03-29 12:03:03,793][00501] Updated weights for policy 0, policy_version 200 (0.0022) [2024-03-29 12:03:05,477][03713] Worker 44 awakens! [2024-03-29 12:03:06,001][01790] Worker 45 awakens! [2024-03-29 12:03:06,685][00126] Fps is (10 sec: 44237.2, 60 sec: 39048.4, 300 sec: 28125.9). Total num frames: 3375104. Throughput: 0: 39729.4. Samples: 3675820. Policy #0 lag: (min: 1.0, avg: 28.5, max: 204.0) [2024-03-29 12:03:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:08,354][02239] Worker 46 awakens! [2024-03-29 12:03:09,617][00501] Updated weights for policy 0, policy_version 210 (0.0018) [2024-03-29 12:03:10,572][02371] Worker 47 awakens! [2024-03-29 12:03:11,685][00126] Fps is (10 sec: 40960.0, 60 sec: 39867.7, 300 sec: 28442.7). Total num frames: 3555328. Throughput: 0: 40158.7. Samples: 3836160. Policy #0 lag: (min: 0.0, avg: 13.6, max: 30.0) [2024-03-29 12:03:11,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:12,851][00501] Updated weights for policy 0, policy_version 220 (0.0020) [2024-03-29 12:03:12,930][02883] Worker 48 awakens! [2024-03-29 12:03:15,348][02356] Worker 49 awakens! [2024-03-29 12:03:16,357][00501] Updated weights for policy 0, policy_version 230 (0.0020) [2024-03-29 12:03:16,686][00126] Fps is (10 sec: 40959.6, 60 sec: 40686.8, 300 sec: 29113.1). Total num frames: 3784704. Throughput: 0: 39475.5. Samples: 4039700. Policy #0 lag: (min: 0.0, avg: 82.6, max: 229.0) [2024-03-29 12:03:16,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:17,548][03445] Worker 50 awakens! [2024-03-29 12:03:19,933][02370] Worker 51 awakens! [2024-03-29 12:03:21,438][00501] Updated weights for policy 0, policy_version 240 (0.0018) [2024-03-29 12:03:21,685][00126] Fps is (10 sec: 37682.9, 60 sec: 39321.7, 300 sec: 29127.1). Total num frames: 3932160. Throughput: 0: 41199.6. Samples: 4329840. Policy #0 lag: (min: 0.0, avg: 58.2, max: 237.0) [2024-03-29 12:03:21,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:22,443][03373] Worker 52 awakens! [2024-03-29 12:03:24,681][00501] Updated weights for policy 0, policy_version 250 (0.0018) [2024-03-29 12:03:24,780][02111] Worker 53 awakens! [2024-03-29 12:03:26,685][00126] Fps is (10 sec: 40960.8, 60 sec: 40140.8, 300 sec: 29959.4). Total num frames: 4194304. Throughput: 0: 40756.0. Samples: 4451040. Policy #0 lag: (min: 2.0, avg: 15.7, max: 33.0) [2024-03-29 12:03:26,688][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:27,091][03070] Worker 54 awakens! [2024-03-29 12:03:27,497][00501] Updated weights for policy 0, policy_version 260 (0.0017) [2024-03-29 12:03:29,521][02500] Worker 55 awakens! [2024-03-29 12:03:31,506][00501] Updated weights for policy 0, policy_version 270 (0.0020) [2024-03-29 12:03:31,633][03586] Worker 56 awakens! [2024-03-29 12:03:31,685][00126] Fps is (10 sec: 49151.7, 60 sec: 41233.1, 300 sec: 30508.2). Total num frames: 4423680. Throughput: 0: 41119.6. Samples: 4675740. Policy #0 lag: (min: 1.0, avg: 19.3, max: 35.0) [2024-03-29 12:03:31,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:34,225][02819] Worker 57 awakens! [2024-03-29 12:03:35,069][00481] Signal inference workers to stop experience collection... (150 times) [2024-03-29 12:03:35,101][00501] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-03-29 12:03:35,275][00481] Signal inference workers to resume experience collection... (150 times) [2024-03-29 12:03:35,276][00501] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-03-29 12:03:36,469][00501] Updated weights for policy 0, policy_version 280 (0.0021) [2024-03-29 12:03:36,553][03585] Worker 58 awakens! [2024-03-29 12:03:36,685][00126] Fps is (10 sec: 39321.5, 60 sec: 40413.9, 300 sec: 30583.5). Total num frames: 4587520. Throughput: 0: 41955.3. Samples: 4961760. Policy #0 lag: (min: 2.0, avg: 15.5, max: 36.0) [2024-03-29 12:03:36,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:38,734][03302] Worker 59 awakens! [2024-03-29 12:03:39,459][00501] Updated weights for policy 0, policy_version 290 (0.0021) [2024-03-29 12:03:41,047][03725] Worker 60 awakens! [2024-03-29 12:03:41,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 31393.9). Total num frames: 4866048. Throughput: 0: 42003.1. Samples: 5068340. Policy #0 lag: (min: 1.0, avg: 21.1, max: 38.0) [2024-03-29 12:03:41,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:43,143][00501] Updated weights for policy 0, policy_version 300 (0.0027) [2024-03-29 12:03:43,497][03572] Worker 61 awakens! [2024-03-29 12:03:45,920][03841] Worker 62 awakens! [2024-03-29 12:03:46,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 31539.2). Total num frames: 5046272. Throughput: 0: 41877.7. Samples: 5325360. Policy #0 lag: (min: 0.0, avg: 22.5, max: 38.0) [2024-03-29 12:03:46,687][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:48,353][00501] Updated weights for policy 0, policy_version 310 (0.0021) [2024-03-29 12:03:50,041][03578] Worker 63 awakens! [2024-03-29 12:03:51,531][00501] Updated weights for policy 0, policy_version 320 (0.0024) [2024-03-29 12:03:51,685][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 31775.1). Total num frames: 5242880. Throughput: 0: 42967.1. Samples: 5609340. Policy #0 lag: (min: 1.0, avg: 17.8, max: 40.0) [2024-03-29 12:03:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:54,680][00501] Updated weights for policy 0, policy_version 330 (0.0025) [2024-03-29 12:03:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42871.5, 300 sec: 32382.5). Total num frames: 5505024. Throughput: 0: 41939.0. Samples: 5723420. Policy #0 lag: (min: 2.0, avg: 22.7, max: 41.0) [2024-03-29 12:03:56,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:03:58,506][00501] Updated weights for policy 0, policy_version 340 (0.0016) [2024-03-29 12:04:00,939][00481] Signal inference workers to stop experience collection... (200 times) [2024-03-29 12:04:01,016][00501] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-03-29 12:04:01,021][00481] Signal inference workers to resume experience collection... (200 times) [2024-03-29 12:04:01,044][00501] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-03-29 12:04:01,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.3, 300 sec: 32580.8). Total num frames: 5701632. Throughput: 0: 42674.8. Samples: 5960060. Policy #0 lag: (min: 0.0, avg: 23.6, max: 41.0) [2024-03-29 12:04:01,688][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:03,782][00501] Updated weights for policy 0, policy_version 350 (0.0018) [2024-03-29 12:04:06,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.2, 300 sec: 32586.0). Total num frames: 5865472. Throughput: 0: 42509.8. Samples: 6242780. Policy #0 lag: (min: 1.0, avg: 18.7, max: 44.0) [2024-03-29 12:04:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:07,170][00501] Updated weights for policy 0, policy_version 360 (0.0022) [2024-03-29 12:04:10,268][00501] Updated weights for policy 0, policy_version 370 (0.0023) [2024-03-29 12:04:11,685][00126] Fps is (10 sec: 44236.7, 60 sec: 43144.4, 300 sec: 33210.8). Total num frames: 6144000. Throughput: 0: 42363.9. Samples: 6357420. Policy #0 lag: (min: 2.0, avg: 23.6, max: 42.0) [2024-03-29 12:04:11,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:14,169][00501] Updated weights for policy 0, policy_version 380 (0.0025) [2024-03-29 12:04:16,685][00126] Fps is (10 sec: 47513.4, 60 sec: 42598.5, 300 sec: 33371.6). Total num frames: 6340608. Throughput: 0: 42478.3. Samples: 6587260. Policy #0 lag: (min: 0.0, avg: 23.8, max: 43.0) [2024-03-29 12:04:16,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:19,190][00501] Updated weights for policy 0, policy_version 390 (0.0017) [2024-03-29 12:04:21,685][00126] Fps is (10 sec: 34406.7, 60 sec: 42598.4, 300 sec: 33272.2). Total num frames: 6488064. Throughput: 0: 42702.7. Samples: 6883380. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 12:04:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:22,681][00501] Updated weights for policy 0, policy_version 400 (0.0021) [2024-03-29 12:04:25,881][00501] Updated weights for policy 0, policy_version 410 (0.0022) [2024-03-29 12:04:26,685][00126] Fps is (10 sec: 42599.0, 60 sec: 42871.5, 300 sec: 33833.0). Total num frames: 6766592. Throughput: 0: 42785.1. Samples: 6993660. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 12:04:26,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:29,620][00501] Updated weights for policy 0, policy_version 420 (0.0022) [2024-03-29 12:04:30,687][00481] Signal inference workers to stop experience collection... (250 times) [2024-03-29 12:04:30,767][00481] Signal inference workers to resume experience collection... (250 times) [2024-03-29 12:04:30,770][00501] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-03-29 12:04:30,792][00501] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-03-29 12:04:31,685][00126] Fps is (10 sec: 49151.6, 60 sec: 42598.4, 300 sec: 34046.8). Total num frames: 6979584. Throughput: 0: 42120.0. Samples: 7220760. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:04:31,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:34,822][00501] Updated weights for policy 0, policy_version 430 (0.0029) [2024-03-29 12:04:36,685][00126] Fps is (10 sec: 34406.3, 60 sec: 42052.3, 300 sec: 33860.3). Total num frames: 7110656. Throughput: 0: 42457.0. Samples: 7519900. Policy #0 lag: (min: 0.0, avg: 17.9, max: 41.0) [2024-03-29 12:04:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:38,126][00501] Updated weights for policy 0, policy_version 440 (0.0029) [2024-03-29 12:04:41,458][00501] Updated weights for policy 0, policy_version 450 (0.0033) [2024-03-29 12:04:41,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 34292.1). Total num frames: 7372800. Throughput: 0: 42334.7. Samples: 7628480. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:04:41,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:04:45,099][00501] Updated weights for policy 0, policy_version 460 (0.0018) [2024-03-29 12:04:46,685][00126] Fps is (10 sec: 50790.2, 60 sec: 42871.5, 300 sec: 34629.8). Total num frames: 7618560. Throughput: 0: 42247.2. Samples: 7861180. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 12:04:46,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:50,301][00501] Updated weights for policy 0, policy_version 470 (0.0023) [2024-03-29 12:04:51,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 34442.8). Total num frames: 7749632. Throughput: 0: 42456.0. Samples: 8153300. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 12:04:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:51,705][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000473_7749632.pth... [2024-03-29 12:04:53,815][00501] Updated weights for policy 0, policy_version 480 (0.0020) [2024-03-29 12:04:56,685][00126] Fps is (10 sec: 37683.0, 60 sec: 41506.2, 300 sec: 34762.6). Total num frames: 7995392. Throughput: 0: 42196.0. Samples: 8256240. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 12:04:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:57,134][00501] Updated weights for policy 0, policy_version 490 (0.0024) [2024-03-29 12:05:00,776][00501] Updated weights for policy 0, policy_version 500 (0.0023) [2024-03-29 12:05:01,686][00126] Fps is (10 sec: 49151.1, 60 sec: 42325.2, 300 sec: 35068.7). Total num frames: 8241152. Throughput: 0: 42280.7. Samples: 8489900. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 12:05:01,687][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:05:05,834][00501] Updated weights for policy 0, policy_version 510 (0.0018) [2024-03-29 12:05:06,685][00126] Fps is (10 sec: 37683.0, 60 sec: 41779.1, 300 sec: 34884.3). Total num frames: 8372224. Throughput: 0: 41964.3. Samples: 8771780. Policy #0 lag: (min: 0.0, avg: 17.5, max: 41.0) [2024-03-29 12:05:06,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:05:06,898][00481] Signal inference workers to stop experience collection... (300 times) [2024-03-29 12:05:06,898][00481] Signal inference workers to resume experience collection... (300 times) [2024-03-29 12:05:06,921][00501] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-03-29 12:05:06,922][00501] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-03-29 12:05:09,395][00501] Updated weights for policy 0, policy_version 520 (0.0025) [2024-03-29 12:05:11,685][00126] Fps is (10 sec: 37684.1, 60 sec: 41233.1, 300 sec: 35175.5). Total num frames: 8617984. Throughput: 0: 42168.8. Samples: 8891260. Policy #0 lag: (min: 1.0, avg: 18.8, max: 45.0) [2024-03-29 12:05:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:12,702][00501] Updated weights for policy 0, policy_version 530 (0.0020) [2024-03-29 12:05:16,174][00501] Updated weights for policy 0, policy_version 540 (0.0024) [2024-03-29 12:05:16,685][00126] Fps is (10 sec: 49152.6, 60 sec: 42052.3, 300 sec: 35455.0). Total num frames: 8863744. Throughput: 0: 42168.1. Samples: 9118320. Policy #0 lag: (min: 0.0, avg: 24.3, max: 44.0) [2024-03-29 12:05:16,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:05:17,025][00481] Saving new best policy, reward=0.002! [2024-03-29 12:05:21,192][00501] Updated weights for policy 0, policy_version 550 (0.0025) [2024-03-29 12:05:21,685][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 35338.1). Total num frames: 9011200. Throughput: 0: 41825.7. Samples: 9402060. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 12:05:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:24,983][00501] Updated weights for policy 0, policy_version 560 (0.0029) [2024-03-29 12:05:26,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41233.0, 300 sec: 35540.7). Total num frames: 9240576. Throughput: 0: 42178.7. Samples: 9526520. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 12:05:26,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:28,200][00501] Updated weights for policy 0, policy_version 570 (0.0019) [2024-03-29 12:05:31,518][00501] Updated weights for policy 0, policy_version 580 (0.0017) [2024-03-29 12:05:31,685][00126] Fps is (10 sec: 49151.8, 60 sec: 42052.2, 300 sec: 35859.3). Total num frames: 9502720. Throughput: 0: 42003.5. Samples: 9751340. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 12:05:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:36,602][00501] Updated weights for policy 0, policy_version 590 (0.0021) [2024-03-29 12:05:36,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 35802.1). Total num frames: 9666560. Throughput: 0: 42000.1. Samples: 10043300. Policy #0 lag: (min: 0.0, avg: 24.2, max: 42.0) [2024-03-29 12:05:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:38,779][00481] Signal inference workers to stop experience collection... (350 times) [2024-03-29 12:05:38,824][00501] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-03-29 12:05:38,997][00481] Signal inference workers to resume experience collection... (350 times) [2024-03-29 12:05:38,998][00501] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-03-29 12:05:40,342][00501] Updated weights for policy 0, policy_version 600 (0.0037) [2024-03-29 12:05:41,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 35925.7). Total num frames: 9879552. Throughput: 0: 42539.6. Samples: 10170520. Policy #0 lag: (min: 1.0, avg: 19.1, max: 45.0) [2024-03-29 12:05:41,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:43,518][00501] Updated weights for policy 0, policy_version 610 (0.0027) [2024-03-29 12:05:46,685][00126] Fps is (10 sec: 47513.3, 60 sec: 42052.3, 300 sec: 36220.4). Total num frames: 10141696. Throughput: 0: 42202.0. Samples: 10388980. Policy #0 lag: (min: 1.0, avg: 23.8, max: 43.0) [2024-03-29 12:05:46,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:46,880][00501] Updated weights for policy 0, policy_version 620 (0.0017) [2024-03-29 12:05:51,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 36159.8). Total num frames: 10305536. Throughput: 0: 42158.2. Samples: 10668900. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 12:05:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:52,112][00501] Updated weights for policy 0, policy_version 630 (0.0026) [2024-03-29 12:05:55,898][00501] Updated weights for policy 0, policy_version 640 (0.0026) [2024-03-29 12:05:56,685][00126] Fps is (10 sec: 36044.5, 60 sec: 41779.2, 300 sec: 36214.3). Total num frames: 10502144. Throughput: 0: 42739.5. Samples: 10814540. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 12:05:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:59,181][00501] Updated weights for policy 0, policy_version 650 (0.0021) [2024-03-29 12:06:01,685][00126] Fps is (10 sec: 45875.4, 60 sec: 42052.4, 300 sec: 36489.1). Total num frames: 10764288. Throughput: 0: 42245.2. Samples: 11019360. Policy #0 lag: (min: 0.0, avg: 22.8, max: 44.0) [2024-03-29 12:06:01,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:02,449][00501] Updated weights for policy 0, policy_version 660 (0.0023) [2024-03-29 12:06:06,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42871.5, 300 sec: 37100.0). Total num frames: 10944512. Throughput: 0: 41879.1. Samples: 11286620. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 12:06:06,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:06:07,521][00501] Updated weights for policy 0, policy_version 670 (0.0018) [2024-03-29 12:06:08,748][00481] Signal inference workers to stop experience collection... (400 times) [2024-03-29 12:06:08,787][00501] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-03-29 12:06:08,964][00481] Signal inference workers to resume experience collection... (400 times) [2024-03-29 12:06:08,964][00501] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-03-29 12:06:11,685][00126] Fps is (10 sec: 36045.0, 60 sec: 41779.2, 300 sec: 37711.0). Total num frames: 11124736. Throughput: 0: 42420.4. Samples: 11435440. Policy #0 lag: (min: 0.0, avg: 16.5, max: 41.0) [2024-03-29 12:06:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:11,719][00501] Updated weights for policy 0, policy_version 680 (0.0029) [2024-03-29 12:06:14,567][00501] Updated weights for policy 0, policy_version 690 (0.0023) [2024-03-29 12:06:16,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 38655.1). Total num frames: 11403264. Throughput: 0: 42454.3. Samples: 11661780. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 12:06:16,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:06:17,837][00501] Updated weights for policy 0, policy_version 700 (0.0020) [2024-03-29 12:06:21,685][00126] Fps is (10 sec: 47513.5, 60 sec: 43144.5, 300 sec: 38266.3). Total num frames: 11599872. Throughput: 0: 41607.9. Samples: 11915660. Policy #0 lag: (min: 0.0, avg: 24.2, max: 42.0) [2024-03-29 12:06:21,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:22,754][00501] Updated weights for policy 0, policy_version 710 (0.0027) [2024-03-29 12:06:26,686][00126] Fps is (10 sec: 36044.5, 60 sec: 42052.2, 300 sec: 38710.7). Total num frames: 11763712. Throughput: 0: 42335.4. Samples: 12075620. Policy #0 lag: (min: 0.0, avg: 16.7, max: 43.0) [2024-03-29 12:06:26,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:27,087][00501] Updated weights for policy 0, policy_version 720 (0.0023) [2024-03-29 12:06:30,005][00501] Updated weights for policy 0, policy_version 730 (0.0030) [2024-03-29 12:06:31,685][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 39488.2). Total num frames: 12042240. Throughput: 0: 42464.0. Samples: 12299860. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 12:06:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:32,694][00481] Signal inference workers to stop experience collection... (450 times) [2024-03-29 12:06:32,731][00501] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-03-29 12:06:32,918][00481] Signal inference workers to resume experience collection... (450 times) [2024-03-29 12:06:32,918][00501] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-03-29 12:06:33,217][00501] Updated weights for policy 0, policy_version 740 (0.0030) [2024-03-29 12:06:36,685][00126] Fps is (10 sec: 49152.2, 60 sec: 43144.4, 300 sec: 40099.1). Total num frames: 12255232. Throughput: 0: 41403.6. Samples: 12532060. Policy #0 lag: (min: 1.0, avg: 24.9, max: 42.0) [2024-03-29 12:06:36,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:38,221][00501] Updated weights for policy 0, policy_version 750 (0.0020) [2024-03-29 12:06:41,685][00126] Fps is (10 sec: 34406.3, 60 sec: 41779.2, 300 sec: 40265.8). Total num frames: 12386304. Throughput: 0: 41894.3. Samples: 12699780. Policy #0 lag: (min: 0.0, avg: 16.8, max: 40.0) [2024-03-29 12:06:41,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:42,660][00501] Updated weights for policy 0, policy_version 760 (0.0026) [2024-03-29 12:06:45,709][00501] Updated weights for policy 0, policy_version 770 (0.0018) [2024-03-29 12:06:46,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 40765.6). Total num frames: 12648448. Throughput: 0: 42450.2. Samples: 12929620. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 12:06:46,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:06:48,808][00501] Updated weights for policy 0, policy_version 780 (0.0022) [2024-03-29 12:06:51,686][00126] Fps is (10 sec: 50789.5, 60 sec: 43144.5, 300 sec: 41209.9). Total num frames: 12894208. Throughput: 0: 41664.3. Samples: 13161520. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 12:06:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:51,875][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000788_12910592.pth... [2024-03-29 12:06:52,279][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000169_2768896.pth [2024-03-29 12:06:53,875][00501] Updated weights for policy 0, policy_version 790 (0.0024) [2024-03-29 12:06:56,685][00126] Fps is (10 sec: 34406.6, 60 sec: 41506.2, 300 sec: 41098.8). Total num frames: 12992512. Throughput: 0: 41709.8. Samples: 13312380. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:06:56,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:06:58,688][00501] Updated weights for policy 0, policy_version 800 (0.0022) [2024-03-29 12:07:01,597][00501] Updated weights for policy 0, policy_version 810 (0.0028) [2024-03-29 12:07:01,685][00126] Fps is (10 sec: 37683.8, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 13271040. Throughput: 0: 42359.1. Samples: 13567940. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:07:01,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:02,469][00481] Signal inference workers to stop experience collection... (500 times) [2024-03-29 12:07:02,557][00501] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-03-29 12:07:02,635][00481] Signal inference workers to resume experience collection... (500 times) [2024-03-29 12:07:02,635][00501] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-03-29 12:07:04,563][00501] Updated weights for policy 0, policy_version 820 (0.0035) [2024-03-29 12:07:06,685][00126] Fps is (10 sec: 50790.4, 60 sec: 42598.4, 300 sec: 41820.8). Total num frames: 13500416. Throughput: 0: 41679.1. Samples: 13791220. Policy #0 lag: (min: 0.0, avg: 24.7, max: 41.0) [2024-03-29 12:07:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:09,570][00501] Updated weights for policy 0, policy_version 830 (0.0028) [2024-03-29 12:07:11,685][00126] Fps is (10 sec: 37683.1, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 13647872. Throughput: 0: 41277.8. Samples: 13933120. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 12:07:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:14,320][00501] Updated weights for policy 0, policy_version 840 (0.0022) [2024-03-29 12:07:16,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 13877248. Throughput: 0: 42169.4. Samples: 14197480. Policy #0 lag: (min: 2.0, avg: 19.4, max: 43.0) [2024-03-29 12:07:16,688][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:07:17,307][00501] Updated weights for policy 0, policy_version 850 (0.0018) [2024-03-29 12:07:20,300][00501] Updated weights for policy 0, policy_version 860 (0.0023) [2024-03-29 12:07:21,685][00126] Fps is (10 sec: 49152.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 14139392. Throughput: 0: 41894.7. Samples: 14417320. Policy #0 lag: (min: 0.0, avg: 24.5, max: 41.0) [2024-03-29 12:07:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:25,045][00501] Updated weights for policy 0, policy_version 870 (0.0027) [2024-03-29 12:07:26,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 14270464. Throughput: 0: 41276.5. Samples: 14557220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 12:07:26,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:07:29,932][00501] Updated weights for policy 0, policy_version 880 (0.0022) [2024-03-29 12:07:31,686][00126] Fps is (10 sec: 36044.4, 60 sec: 40959.9, 300 sec: 41820.8). Total num frames: 14499840. Throughput: 0: 42091.0. Samples: 14823720. Policy #0 lag: (min: 0.0, avg: 18.6, max: 43.0) [2024-03-29 12:07:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:33,111][00501] Updated weights for policy 0, policy_version 890 (0.0019) [2024-03-29 12:07:34,221][00481] Signal inference workers to stop experience collection... (550 times) [2024-03-29 12:07:34,281][00501] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-03-29 12:07:34,313][00481] Signal inference workers to resume experience collection... (550 times) [2024-03-29 12:07:34,318][00501] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-03-29 12:07:35,924][00501] Updated weights for policy 0, policy_version 900 (0.0027) [2024-03-29 12:07:36,685][00126] Fps is (10 sec: 49151.6, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 14761984. Throughput: 0: 41731.2. Samples: 15039420. Policy #0 lag: (min: 2.0, avg: 21.9, max: 42.0) [2024-03-29 12:07:36,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:07:40,809][00501] Updated weights for policy 0, policy_version 910 (0.0027) [2024-03-29 12:07:41,685][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 14909440. Throughput: 0: 41303.0. Samples: 15171020. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 12:07:41,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:07:45,507][00501] Updated weights for policy 0, policy_version 920 (0.0018) [2024-03-29 12:07:46,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 15122432. Throughput: 0: 41982.6. Samples: 15457160. Policy #0 lag: (min: 0.0, avg: 18.5, max: 43.0) [2024-03-29 12:07:46,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:48,597][00501] Updated weights for policy 0, policy_version 930 (0.0022) [2024-03-29 12:07:51,465][00501] Updated weights for policy 0, policy_version 940 (0.0021) [2024-03-29 12:07:51,685][00126] Fps is (10 sec: 49152.5, 60 sec: 41779.4, 300 sec: 42265.2). Total num frames: 15400960. Throughput: 0: 41796.0. Samples: 15672040. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:07:51,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:07:56,417][00501] Updated weights for policy 0, policy_version 950 (0.0028) [2024-03-29 12:07:56,686][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 15564800. Throughput: 0: 41438.6. Samples: 15797860. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 12:07:56,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:08:01,224][00501] Updated weights for policy 0, policy_version 960 (0.0019) [2024-03-29 12:08:01,685][00126] Fps is (10 sec: 34406.1, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 15745024. Throughput: 0: 42014.1. Samples: 16088120. Policy #0 lag: (min: 0.0, avg: 16.8, max: 41.0) [2024-03-29 12:08:01,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:08:04,100][00501] Updated weights for policy 0, policy_version 970 (0.0022) [2024-03-29 12:08:04,565][00481] Signal inference workers to stop experience collection... (600 times) [2024-03-29 12:08:04,588][00501] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-03-29 12:08:04,768][00481] Signal inference workers to resume experience collection... (600 times) [2024-03-29 12:08:04,769][00501] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-03-29 12:08:06,685][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 16023552. Throughput: 0: 41968.0. Samples: 16305880. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 12:08:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:07,031][00501] Updated weights for policy 0, policy_version 980 (0.0030) [2024-03-29 12:08:11,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 16187392. Throughput: 0: 41706.3. Samples: 16434000. Policy #0 lag: (min: 0.0, avg: 24.7, max: 44.0) [2024-03-29 12:08:11,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:08:12,020][00501] Updated weights for policy 0, policy_version 990 (0.0020) [2024-03-29 12:08:16,685][00126] Fps is (10 sec: 34406.5, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 16367616. Throughput: 0: 42193.9. Samples: 16722440. Policy #0 lag: (min: 0.0, avg: 16.9, max: 41.0) [2024-03-29 12:08:16,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:08:16,768][00501] Updated weights for policy 0, policy_version 1000 (0.0021) [2024-03-29 12:08:19,787][00501] Updated weights for policy 0, policy_version 1010 (0.0022) [2024-03-29 12:08:21,685][00126] Fps is (10 sec: 45874.8, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 16646144. Throughput: 0: 42239.2. Samples: 16940180. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:08:21,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:08:22,625][00501] Updated weights for policy 0, policy_version 1020 (0.0024) [2024-03-29 12:08:26,685][00126] Fps is (10 sec: 47513.4, 60 sec: 42871.4, 300 sec: 42098.6). Total num frames: 16842752. Throughput: 0: 42173.0. Samples: 17068800. Policy #0 lag: (min: 0.0, avg: 24.8, max: 44.0) [2024-03-29 12:08:26,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:08:26,686][00481] Saving new best policy, reward=0.004! [2024-03-29 12:08:27,664][00501] Updated weights for policy 0, policy_version 1030 (0.0023) [2024-03-29 12:08:31,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 17006592. Throughput: 0: 42012.5. Samples: 17347720. Policy #0 lag: (min: 0.0, avg: 17.4, max: 41.0) [2024-03-29 12:08:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:32,323][00501] Updated weights for policy 0, policy_version 1040 (0.0021) [2024-03-29 12:08:35,526][00501] Updated weights for policy 0, policy_version 1050 (0.0022) [2024-03-29 12:08:36,685][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 17268736. Throughput: 0: 42253.7. Samples: 17573460. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 12:08:36,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:08:37,250][00481] Signal inference workers to stop experience collection... (650 times) [2024-03-29 12:08:37,286][00501] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-03-29 12:08:37,464][00481] Signal inference workers to resume experience collection... (650 times) [2024-03-29 12:08:37,464][00501] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-03-29 12:08:38,298][00501] Updated weights for policy 0, policy_version 1060 (0.0018) [2024-03-29 12:08:41,685][00126] Fps is (10 sec: 49151.9, 60 sec: 43144.6, 300 sec: 42209.6). Total num frames: 17498112. Throughput: 0: 42057.9. Samples: 17690460. Policy #0 lag: (min: 0.0, avg: 24.8, max: 43.0) [2024-03-29 12:08:41,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:43,350][00501] Updated weights for policy 0, policy_version 1070 (0.0017) [2024-03-29 12:08:46,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 17629184. Throughput: 0: 41967.6. Samples: 17976660. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 12:08:46,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:08:47,869][00501] Updated weights for policy 0, policy_version 1080 (0.0019) [2024-03-29 12:08:51,090][00501] Updated weights for policy 0, policy_version 1090 (0.0026) [2024-03-29 12:08:51,685][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 17891328. Throughput: 0: 42289.0. Samples: 18208880. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 12:08:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:51,947][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001093_17907712.pth... [2024-03-29 12:08:52,312][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000473_7749632.pth [2024-03-29 12:08:54,268][00501] Updated weights for policy 0, policy_version 1100 (0.0025) [2024-03-29 12:08:56,686][00126] Fps is (10 sec: 49151.8, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 18120704. Throughput: 0: 41824.3. Samples: 18316100. Policy #0 lag: (min: 1.0, avg: 24.5, max: 42.0) [2024-03-29 12:08:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:59,171][00501] Updated weights for policy 0, policy_version 1110 (0.0022) [2024-03-29 12:09:01,685][00126] Fps is (10 sec: 36044.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 18251776. Throughput: 0: 41635.1. Samples: 18596020. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 12:09:01,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:03,690][00501] Updated weights for policy 0, policy_version 1120 (0.0026) [2024-03-29 12:09:06,686][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 18497536. Throughput: 0: 42195.9. Samples: 18839000. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 12:09:06,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:09:06,864][00501] Updated weights for policy 0, policy_version 1130 (0.0025) [2024-03-29 12:09:07,683][00481] Signal inference workers to stop experience collection... (700 times) [2024-03-29 12:09:07,726][00501] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-03-29 12:09:07,767][00481] Signal inference workers to resume experience collection... (700 times) [2024-03-29 12:09:07,769][00501] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-03-29 12:09:09,830][00501] Updated weights for policy 0, policy_version 1140 (0.0023) [2024-03-29 12:09:11,685][00126] Fps is (10 sec: 50789.9, 60 sec: 42871.3, 300 sec: 42098.5). Total num frames: 18759680. Throughput: 0: 41765.2. Samples: 18948240. Policy #0 lag: (min: 0.0, avg: 23.8, max: 42.0) [2024-03-29 12:09:11,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:09:14,483][00501] Updated weights for policy 0, policy_version 1150 (0.0024) [2024-03-29 12:09:16,685][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 18907136. Throughput: 0: 41907.1. Samples: 19233540. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:09:16,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:19,018][00501] Updated weights for policy 0, policy_version 1160 (0.0023) [2024-03-29 12:09:21,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 19120128. Throughput: 0: 42290.3. Samples: 19476520. Policy #0 lag: (min: 2.0, avg: 18.9, max: 42.0) [2024-03-29 12:09:21,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:22,425][00501] Updated weights for policy 0, policy_version 1170 (0.0026) [2024-03-29 12:09:25,342][00501] Updated weights for policy 0, policy_version 1180 (0.0023) [2024-03-29 12:09:26,685][00126] Fps is (10 sec: 47513.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 19382272. Throughput: 0: 42243.9. Samples: 19591440. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 12:09:26,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:09:29,897][00501] Updated weights for policy 0, policy_version 1190 (0.0028) [2024-03-29 12:09:31,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 19529728. Throughput: 0: 41756.5. Samples: 19855700. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 12:09:31,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:34,579][00501] Updated weights for policy 0, policy_version 1200 (0.0027) [2024-03-29 12:09:36,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 19759104. Throughput: 0: 42155.9. Samples: 20105900. Policy #0 lag: (min: 1.0, avg: 20.0, max: 43.0) [2024-03-29 12:09:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:09:37,194][00481] Signal inference workers to stop experience collection... (750 times) [2024-03-29 12:09:37,229][00501] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-03-29 12:09:37,405][00481] Signal inference workers to resume experience collection... (750 times) [2024-03-29 12:09:37,406][00501] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-03-29 12:09:37,951][00501] Updated weights for policy 0, policy_version 1210 (0.0023) [2024-03-29 12:09:40,964][00501] Updated weights for policy 0, policy_version 1220 (0.0025) [2024-03-29 12:09:41,686][00126] Fps is (10 sec: 47513.0, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 20004864. Throughput: 0: 42221.3. Samples: 20216060. Policy #0 lag: (min: 1.0, avg: 22.3, max: 42.0) [2024-03-29 12:09:41,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:09:45,711][00501] Updated weights for policy 0, policy_version 1230 (0.0018) [2024-03-29 12:09:46,685][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 20152320. Throughput: 0: 41661.3. Samples: 20470780. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 12:09:46,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:50,274][00501] Updated weights for policy 0, policy_version 1240 (0.0028) [2024-03-29 12:09:51,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 20381696. Throughput: 0: 42074.8. Samples: 20732360. Policy #0 lag: (min: 2.0, avg: 17.0, max: 42.0) [2024-03-29 12:09:51,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:09:53,482][00501] Updated weights for policy 0, policy_version 1250 (0.0021) [2024-03-29 12:09:56,685][00126] Fps is (10 sec: 47513.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 20627456. Throughput: 0: 42277.9. Samples: 20850740. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 12:09:56,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:09:56,718][00501] Updated weights for policy 0, policy_version 1260 (0.0021) [2024-03-29 12:10:01,133][00501] Updated weights for policy 0, policy_version 1270 (0.0018) [2024-03-29 12:10:01,685][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 20807680. Throughput: 0: 41619.5. Samples: 21106420. Policy #0 lag: (min: 1.0, avg: 23.9, max: 41.0) [2024-03-29 12:10:01,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:05,674][00501] Updated weights for policy 0, policy_version 1280 (0.0019) [2024-03-29 12:10:06,685][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 21020672. Throughput: 0: 42099.6. Samples: 21371000. Policy #0 lag: (min: 2.0, avg: 18.0, max: 43.0) [2024-03-29 12:10:06,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:08,992][00501] Updated weights for policy 0, policy_version 1290 (0.0029) [2024-03-29 12:10:10,359][00481] Signal inference workers to stop experience collection... (800 times) [2024-03-29 12:10:10,441][00481] Signal inference workers to resume experience collection... (800 times) [2024-03-29 12:10:10,447][00501] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-03-29 12:10:10,466][00501] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-03-29 12:10:11,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 21266432. Throughput: 0: 42121.4. Samples: 21486900. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 12:10:11,688][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:10:12,191][00501] Updated weights for policy 0, policy_version 1300 (0.0017) [2024-03-29 12:10:16,571][00501] Updated weights for policy 0, policy_version 1310 (0.0025) [2024-03-29 12:10:16,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 21463040. Throughput: 0: 41902.7. Samples: 21741320. Policy #0 lag: (min: 0.0, avg: 24.6, max: 42.0) [2024-03-29 12:10:16,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:10:21,034][00501] Updated weights for policy 0, policy_version 1320 (0.0029) [2024-03-29 12:10:21,685][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 21643264. Throughput: 0: 42469.3. Samples: 22017020. Policy #0 lag: (min: 2.0, avg: 18.1, max: 43.0) [2024-03-29 12:10:21,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:10:24,511][00501] Updated weights for policy 0, policy_version 1330 (0.0023) [2024-03-29 12:10:26,685][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 21889024. Throughput: 0: 42492.1. Samples: 22128200. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 12:10:26,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:10:27,072][00481] Saving new best policy, reward=0.006! [2024-03-29 12:10:27,957][00501] Updated weights for policy 0, policy_version 1340 (0.0022) [2024-03-29 12:10:31,685][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 22085632. Throughput: 0: 41922.7. Samples: 22357300. Policy #0 lag: (min: 1.0, avg: 23.6, max: 41.0) [2024-03-29 12:10:31,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:10:32,124][00501] Updated weights for policy 0, policy_version 1350 (0.0018) [2024-03-29 12:10:36,649][00501] Updated weights for policy 0, policy_version 1360 (0.0032) [2024-03-29 12:10:36,685][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 22282240. Throughput: 0: 42157.8. Samples: 22629460. Policy #0 lag: (min: 2.0, avg: 18.4, max: 43.0) [2024-03-29 12:10:36,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:10:40,329][00501] Updated weights for policy 0, policy_version 1370 (0.0025) [2024-03-29 12:10:40,339][00481] Signal inference workers to stop experience collection... (850 times) [2024-03-29 12:10:40,340][00481] Signal inference workers to resume experience collection... (850 times) [2024-03-29 12:10:40,375][00501] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-03-29 12:10:40,375][00501] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-03-29 12:10:41,685][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 22511616. Throughput: 0: 42184.5. Samples: 22749040. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 12:10:41,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:43,387][00501] Updated weights for policy 0, policy_version 1380 (0.0019) [2024-03-29 12:10:46,685][00126] Fps is (10 sec: 45874.9, 60 sec: 43144.5, 300 sec: 42154.1). Total num frames: 22740992. Throughput: 0: 41990.6. Samples: 22996000. Policy #0 lag: (min: 1.0, avg: 23.1, max: 41.0) [2024-03-29 12:10:46,690][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:10:47,669][00501] Updated weights for policy 0, policy_version 1390 (0.0022) [2024-03-29 12:10:51,686][00126] Fps is (10 sec: 39320.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 22904832. Throughput: 0: 42353.7. Samples: 23276920. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:10:51,689][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:52,164][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001400_22937600.pth... [2024-03-29 12:10:52,176][00501] Updated weights for policy 0, policy_version 1400 (0.0028) [2024-03-29 12:10:52,521][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000788_12910592.pth [2024-03-29 12:10:55,895][00501] Updated weights for policy 0, policy_version 1410 (0.0024) [2024-03-29 12:10:56,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 23150592. Throughput: 0: 42067.1. Samples: 23379920. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 12:10:56,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:10:59,191][00501] Updated weights for policy 0, policy_version 1420 (0.0029) [2024-03-29 12:11:01,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 23363584. Throughput: 0: 41618.6. Samples: 23614160. Policy #0 lag: (min: 1.0, avg: 23.4, max: 41.0) [2024-03-29 12:11:01,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:11:03,464][00501] Updated weights for policy 0, policy_version 1430 (0.0022) [2024-03-29 12:11:06,686][00126] Fps is (10 sec: 36044.6, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 23511040. Throughput: 0: 41807.1. Samples: 23898340. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:11:06,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:11:08,094][00501] Updated weights for policy 0, policy_version 1440 (0.0027) [2024-03-29 12:11:11,652][00501] Updated weights for policy 0, policy_version 1450 (0.0026) [2024-03-29 12:11:11,686][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 23756800. Throughput: 0: 41765.2. Samples: 24007640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 12:11:11,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:11:12,109][00481] Signal inference workers to stop experience collection... (900 times) [2024-03-29 12:11:12,160][00501] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-03-29 12:11:12,199][00481] Signal inference workers to resume experience collection... (900 times) [2024-03-29 12:11:12,203][00501] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-03-29 12:11:14,864][00501] Updated weights for policy 0, policy_version 1460 (0.0022) [2024-03-29 12:11:16,685][00126] Fps is (10 sec: 49153.0, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 24002560. Throughput: 0: 41873.0. Samples: 24241580. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:11:16,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:11:19,152][00501] Updated weights for policy 0, policy_version 1470 (0.0019) [2024-03-29 12:11:21,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 24133632. Throughput: 0: 42094.2. Samples: 24523700. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 12:11:21,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:11:23,772][00501] Updated weights for policy 0, policy_version 1480 (0.0029) [2024-03-29 12:11:26,685][00126] Fps is (10 sec: 37682.5, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 24379392. Throughput: 0: 42233.6. Samples: 24649560. Policy #0 lag: (min: 0.0, avg: 18.3, max: 43.0) [2024-03-29 12:11:26,689][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:11:27,356][00501] Updated weights for policy 0, policy_version 1490 (0.0026) [2024-03-29 12:11:30,615][00501] Updated weights for policy 0, policy_version 1500 (0.0034) [2024-03-29 12:11:31,685][00126] Fps is (10 sec: 47513.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 24608768. Throughput: 0: 41595.7. Samples: 24867800. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 12:11:31,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:11:34,652][00501] Updated weights for policy 0, policy_version 1510 (0.0019) [2024-03-29 12:11:36,686][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.0, 300 sec: 42043.0). Total num frames: 24788992. Throughput: 0: 41331.0. Samples: 25136820. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 12:11:36,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:11:39,478][00501] Updated weights for policy 0, policy_version 1520 (0.0028) [2024-03-29 12:11:41,686][00126] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 25001984. Throughput: 0: 41939.5. Samples: 25267200. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 12:11:41,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:11:42,916][00481] Signal inference workers to stop experience collection... (950 times) [2024-03-29 12:11:42,954][00501] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-03-29 12:11:43,143][00481] Signal inference workers to resume experience collection... (950 times) [2024-03-29 12:11:43,143][00501] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-03-29 12:11:43,146][00501] Updated weights for policy 0, policy_version 1530 (0.0020) [2024-03-29 12:11:46,387][00501] Updated weights for policy 0, policy_version 1540 (0.0023) [2024-03-29 12:11:46,685][00126] Fps is (10 sec: 45875.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 25247744. Throughput: 0: 41813.3. Samples: 25495760. Policy #0 lag: (min: 0.0, avg: 23.5, max: 44.0) [2024-03-29 12:11:46,687][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:11:50,487][00501] Updated weights for policy 0, policy_version 1550 (0.0024) [2024-03-29 12:11:51,685][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 25411584. Throughput: 0: 41661.4. Samples: 25773100. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 12:11:51,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:11:51,836][00481] Saving new best policy, reward=0.008! [2024-03-29 12:11:55,105][00501] Updated weights for policy 0, policy_version 1560 (0.0025) [2024-03-29 12:11:56,685][00126] Fps is (10 sec: 39322.5, 60 sec: 41506.2, 300 sec: 41932.0). Total num frames: 25640960. Throughput: 0: 42194.0. Samples: 25906360. Policy #0 lag: (min: 1.0, avg: 20.1, max: 43.0) [2024-03-29 12:11:56,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:11:58,821][00501] Updated weights for policy 0, policy_version 1570 (0.0025) [2024-03-29 12:12:01,685][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 25853952. Throughput: 0: 42123.9. Samples: 26137160. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:12:01,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:02,136][00501] Updated weights for policy 0, policy_version 1580 (0.0018) [2024-03-29 12:12:06,144][00501] Updated weights for policy 0, policy_version 1590 (0.0021) [2024-03-29 12:12:06,685][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 26050560. Throughput: 0: 41668.0. Samples: 26398760. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:12:06,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:12:10,590][00501] Updated weights for policy 0, policy_version 1600 (0.0024) [2024-03-29 12:12:11,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 26263552. Throughput: 0: 42117.9. Samples: 26544860. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 12:12:11,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:14,123][00501] Updated weights for policy 0, policy_version 1610 (0.0023) [2024-03-29 12:12:15,724][00481] Signal inference workers to stop experience collection... (1000 times) [2024-03-29 12:12:15,761][00501] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-03-29 12:12:15,938][00481] Signal inference workers to resume experience collection... (1000 times) [2024-03-29 12:12:15,939][00501] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-03-29 12:12:16,686][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 26492928. Throughput: 0: 42445.6. Samples: 26777860. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 12:12:16,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:12:17,485][00501] Updated weights for policy 0, policy_version 1620 (0.0021) [2024-03-29 12:12:21,685][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 26689536. Throughput: 0: 42006.4. Samples: 27027100. Policy #0 lag: (min: 0.0, avg: 23.3, max: 41.0) [2024-03-29 12:12:21,687][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:12:21,692][00501] Updated weights for policy 0, policy_version 1630 (0.0022) [2024-03-29 12:12:26,085][00501] Updated weights for policy 0, policy_version 1640 (0.0026) [2024-03-29 12:12:26,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 26886144. Throughput: 0: 42382.3. Samples: 27174400. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 12:12:26,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:29,591][00501] Updated weights for policy 0, policy_version 1650 (0.0022) [2024-03-29 12:12:31,686][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 27131904. Throughput: 0: 42589.3. Samples: 27412280. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 12:12:31,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:12:32,816][00501] Updated weights for policy 0, policy_version 1660 (0.0020) [2024-03-29 12:12:36,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 27344896. Throughput: 0: 42001.2. Samples: 27663160. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 12:12:36,687][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:12:37,017][00501] Updated weights for policy 0, policy_version 1670 (0.0021) [2024-03-29 12:12:41,470][00501] Updated weights for policy 0, policy_version 1680 (0.0020) [2024-03-29 12:12:41,685][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 27525120. Throughput: 0: 42409.2. Samples: 27814780. Policy #0 lag: (min: 1.0, avg: 17.7, max: 41.0) [2024-03-29 12:12:41,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:12:44,995][00501] Updated weights for policy 0, policy_version 1690 (0.0019) [2024-03-29 12:12:46,685][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 27754496. Throughput: 0: 42348.0. Samples: 28042820. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 12:12:46,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:48,390][00501] Updated weights for policy 0, policy_version 1700 (0.0019) [2024-03-29 12:12:51,686][00126] Fps is (10 sec: 45874.7, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 27983872. Throughput: 0: 42194.6. Samples: 28297520. Policy #0 lag: (min: 0.0, avg: 23.9, max: 43.0) [2024-03-29 12:12:51,687][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:12:51,708][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001708_27983872.pth... [2024-03-29 12:12:52,045][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001093_17907712.pth [2024-03-29 12:12:52,654][00501] Updated weights for policy 0, policy_version 1710 (0.0028) [2024-03-29 12:12:55,154][00481] Signal inference workers to stop experience collection... (1050 times) [2024-03-29 12:12:55,225][00501] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-03-29 12:12:55,235][00481] Signal inference workers to resume experience collection... (1050 times) [2024-03-29 12:12:55,256][00501] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-03-29 12:12:56,687][00126] Fps is (10 sec: 39315.5, 60 sec: 41778.0, 300 sec: 42042.8). Total num frames: 28147712. Throughput: 0: 42139.9. Samples: 28441220. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 12:12:56,687][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:12:57,098][00501] Updated weights for policy 0, policy_version 1720 (0.0030) [2024-03-29 12:13:00,611][00501] Updated weights for policy 0, policy_version 1730 (0.0028) [2024-03-29 12:13:01,686][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 28393472. Throughput: 0: 42187.2. Samples: 28676280. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 12:13:01,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:13:04,108][00501] Updated weights for policy 0, policy_version 1740 (0.0028) [2024-03-29 12:13:06,685][00126] Fps is (10 sec: 47520.7, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 28622848. Throughput: 0: 41909.7. Samples: 28913040. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 12:13:06,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:13:08,551][00501] Updated weights for policy 0, policy_version 1750 (0.0021) [2024-03-29 12:13:11,685][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 28753920. Throughput: 0: 41964.5. Samples: 29062800. Policy #0 lag: (min: 0.0, avg: 16.8, max: 43.0) [2024-03-29 12:13:11,687][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:13:12,965][00501] Updated weights for policy 0, policy_version 1760 (0.0021) [2024-03-29 12:13:16,505][00501] Updated weights for policy 0, policy_version 1770 (0.0020) [2024-03-29 12:13:16,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 28999680. Throughput: 0: 41828.6. Samples: 29294560. Policy #0 lag: (min: 2.0, avg: 20.7, max: 41.0) [2024-03-29 12:13:16,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:19,866][00501] Updated weights for policy 0, policy_version 1780 (0.0019) [2024-03-29 12:13:21,685][00126] Fps is (10 sec: 49151.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 29245440. Throughput: 0: 41549.8. Samples: 29532900. Policy #0 lag: (min: 1.0, avg: 24.3, max: 44.0) [2024-03-29 12:13:21,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:24,190][00501] Updated weights for policy 0, policy_version 1790 (0.0021) [2024-03-29 12:13:26,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 29376512. Throughput: 0: 41542.3. Samples: 29684180. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 12:13:26,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:26,767][00481] Signal inference workers to stop experience collection... (1100 times) [2024-03-29 12:13:26,808][00501] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-03-29 12:13:26,948][00481] Signal inference workers to resume experience collection... (1100 times) [2024-03-29 12:13:26,948][00501] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-03-29 12:13:28,557][00501] Updated weights for policy 0, policy_version 1800 (0.0020) [2024-03-29 12:13:31,686][00126] Fps is (10 sec: 37682.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 29622272. Throughput: 0: 41741.2. Samples: 29921180. Policy #0 lag: (min: 3.0, avg: 18.4, max: 43.0) [2024-03-29 12:13:31,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:13:32,132][00501] Updated weights for policy 0, policy_version 1810 (0.0023) [2024-03-29 12:13:35,455][00501] Updated weights for policy 0, policy_version 1820 (0.0031) [2024-03-29 12:13:36,686][00126] Fps is (10 sec: 50789.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 29884416. Throughput: 0: 41461.4. Samples: 30163280. Policy #0 lag: (min: 1.0, avg: 22.3, max: 42.0) [2024-03-29 12:13:36,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:39,866][00501] Updated weights for policy 0, policy_version 1830 (0.0020) [2024-03-29 12:13:41,685][00126] Fps is (10 sec: 40960.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 30031872. Throughput: 0: 41624.1. Samples: 30314240. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:13:41,687][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:44,199][00501] Updated weights for policy 0, policy_version 1840 (0.0026) [2024-03-29 12:13:46,685][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41987.4). Total num frames: 30277632. Throughput: 0: 41846.3. Samples: 30559360. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:13:46,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:13:47,473][00501] Updated weights for policy 0, policy_version 1850 (0.0025) [2024-03-29 12:13:51,036][00501] Updated weights for policy 0, policy_version 1860 (0.0025) [2024-03-29 12:13:51,685][00126] Fps is (10 sec: 47513.7, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 30507008. Throughput: 0: 42045.4. Samples: 30805080. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:13:51,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:13:55,519][00501] Updated weights for policy 0, policy_version 1870 (0.0017) [2024-03-29 12:13:56,685][00126] Fps is (10 sec: 40960.3, 60 sec: 42326.4, 300 sec: 42154.1). Total num frames: 30687232. Throughput: 0: 41704.9. Samples: 30939520. Policy #0 lag: (min: 0.0, avg: 19.8, max: 42.0) [2024-03-29 12:13:56,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:13:58,204][00481] Signal inference workers to stop experience collection... (1150 times) [2024-03-29 12:13:58,246][00501] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-03-29 12:13:58,283][00481] Signal inference workers to resume experience collection... (1150 times) [2024-03-29 12:13:58,286][00501] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-03-29 12:13:59,966][00501] Updated weights for policy 0, policy_version 1880 (0.0028) [2024-03-29 12:14:01,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 30900224. Throughput: 0: 42174.6. Samples: 31192420. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 12:14:01,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:14:01,705][00481] Saving new best policy, reward=0.011! [2024-03-29 12:14:03,269][00501] Updated weights for policy 0, policy_version 1890 (0.0019) [2024-03-29 12:14:06,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 31113216. Throughput: 0: 42218.2. Samples: 31432720. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 12:14:06,688][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:14:06,878][00501] Updated weights for policy 0, policy_version 1900 (0.0023) [2024-03-29 12:14:11,564][00501] Updated weights for policy 0, policy_version 1910 (0.0019) [2024-03-29 12:14:11,685][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 31293440. Throughput: 0: 41447.1. Samples: 31549300. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:14:11,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:14:15,857][00501] Updated weights for policy 0, policy_version 1920 (0.0022) [2024-03-29 12:14:16,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 31506432. Throughput: 0: 42206.8. Samples: 31820480. Policy #0 lag: (min: 0.0, avg: 15.2, max: 41.0) [2024-03-29 12:14:16,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:14:19,029][00501] Updated weights for policy 0, policy_version 1930 (0.0023) [2024-03-29 12:14:21,685][00126] Fps is (10 sec: 44237.0, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 31735808. Throughput: 0: 41889.9. Samples: 32048320. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 12:14:21,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:14:22,572][00501] Updated weights for policy 0, policy_version 1940 (0.0021) [2024-03-29 12:14:26,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 31932416. Throughput: 0: 41178.2. Samples: 32167260. Policy #0 lag: (min: 0.0, avg: 23.0, max: 41.0) [2024-03-29 12:14:26,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:14:27,035][00501] Updated weights for policy 0, policy_version 1950 (0.0018) [2024-03-29 12:14:29,677][00481] Signal inference workers to stop experience collection... (1200 times) [2024-03-29 12:14:29,720][00501] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-03-29 12:14:29,760][00481] Signal inference workers to resume experience collection... (1200 times) [2024-03-29 12:14:29,761][00501] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-03-29 12:14:31,371][00501] Updated weights for policy 0, policy_version 1960 (0.0024) [2024-03-29 12:14:31,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 32129024. Throughput: 0: 42334.3. Samples: 32464400. Policy #0 lag: (min: 0.0, avg: 16.1, max: 41.0) [2024-03-29 12:14:31,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:14:34,618][00501] Updated weights for policy 0, policy_version 1970 (0.0029) [2024-03-29 12:14:36,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.2, 300 sec: 41932.0). Total num frames: 32374784. Throughput: 0: 41803.1. Samples: 32686220. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:14:36,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:14:38,167][00501] Updated weights for policy 0, policy_version 1980 (0.0021) [2024-03-29 12:14:41,685][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 32555008. Throughput: 0: 41270.1. Samples: 32796680. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 12:14:41,688][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:14:42,723][00501] Updated weights for policy 0, policy_version 1990 (0.0019) [2024-03-29 12:14:46,685][00126] Fps is (10 sec: 36044.9, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 32735232. Throughput: 0: 42358.4. Samples: 33098540. Policy #0 lag: (min: 0.0, avg: 16.2, max: 41.0) [2024-03-29 12:14:46,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:14:46,956][00501] Updated weights for policy 0, policy_version 2000 (0.0024) [2024-03-29 12:14:50,227][00501] Updated weights for policy 0, policy_version 2010 (0.0024) [2024-03-29 12:14:51,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 32980992. Throughput: 0: 41786.1. Samples: 33313100. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 12:14:51,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:14:51,918][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002014_32997376.pth... [2024-03-29 12:14:52,312][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001400_22937600.pth [2024-03-29 12:14:54,157][00501] Updated weights for policy 0, policy_version 2020 (0.0019) [2024-03-29 12:14:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 33193984. Throughput: 0: 41580.5. Samples: 33420420. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:14:56,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:14:58,846][00501] Updated weights for policy 0, policy_version 2030 (0.0027) [2024-03-29 12:15:01,583][00481] Signal inference workers to stop experience collection... (1250 times) [2024-03-29 12:15:01,584][00481] Signal inference workers to resume experience collection... (1250 times) [2024-03-29 12:15:01,628][00501] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-03-29 12:15:01,633][00501] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-03-29 12:15:01,685][00126] Fps is (10 sec: 34406.6, 60 sec: 40413.9, 300 sec: 41709.8). Total num frames: 33325056. Throughput: 0: 41577.3. Samples: 33691460. Policy #0 lag: (min: 0.0, avg: 15.6, max: 41.0) [2024-03-29 12:15:01,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:03,196][00501] Updated weights for policy 0, policy_version 2040 (0.0026) [2024-03-29 12:15:06,453][00501] Updated weights for policy 0, policy_version 2050 (0.0028) [2024-03-29 12:15:06,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 33587200. Throughput: 0: 41612.8. Samples: 33920900. Policy #0 lag: (min: 0.0, avg: 22.4, max: 45.0) [2024-03-29 12:15:06,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:15:09,963][00501] Updated weights for policy 0, policy_version 2060 (0.0023) [2024-03-29 12:15:11,685][00126] Fps is (10 sec: 52428.8, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 33849344. Throughput: 0: 41825.8. Samples: 34049420. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 12:15:11,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:15:14,551][00501] Updated weights for policy 0, policy_version 2070 (0.0021) [2024-03-29 12:15:16,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 33964032. Throughput: 0: 40810.6. Samples: 34300880. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 12:15:16,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:15:19,167][00501] Updated weights for policy 0, policy_version 2080 (0.0031) [2024-03-29 12:15:21,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 34209792. Throughput: 0: 41089.3. Samples: 34535240. Policy #0 lag: (min: 0.0, avg: 22.0, max: 45.0) [2024-03-29 12:15:21,686][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:15:21,704][00481] Saving new best policy, reward=0.014! [2024-03-29 12:15:22,606][00501] Updated weights for policy 0, policy_version 2090 (0.0021) [2024-03-29 12:15:25,912][00501] Updated weights for policy 0, policy_version 2100 (0.0027) [2024-03-29 12:15:26,685][00126] Fps is (10 sec: 47514.3, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 34439168. Throughput: 0: 41225.0. Samples: 34651800. Policy #0 lag: (min: 2.0, avg: 22.1, max: 42.0) [2024-03-29 12:15:26,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:15:30,089][00481] Signal inference workers to stop experience collection... (1300 times) [2024-03-29 12:15:30,148][00501] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-03-29 12:15:30,247][00481] Signal inference workers to resume experience collection... (1300 times) [2024-03-29 12:15:30,248][00501] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-03-29 12:15:30,506][00501] Updated weights for policy 0, policy_version 2110 (0.0021) [2024-03-29 12:15:31,686][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 34603008. Throughput: 0: 40270.5. Samples: 34910720. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 12:15:31,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:15:35,128][00501] Updated weights for policy 0, policy_version 2120 (0.0022) [2024-03-29 12:15:36,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41709.8). Total num frames: 34816000. Throughput: 0: 40977.9. Samples: 35157100. Policy #0 lag: (min: 0.0, avg: 16.4, max: 41.0) [2024-03-29 12:15:36,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:15:38,378][00501] Updated weights for policy 0, policy_version 2130 (0.0022) [2024-03-29 12:15:41,685][00126] Fps is (10 sec: 44237.5, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 35045376. Throughput: 0: 41348.4. Samples: 35281100. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 12:15:41,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:41,779][00501] Updated weights for policy 0, policy_version 2140 (0.0016) [2024-03-29 12:15:46,248][00501] Updated weights for policy 0, policy_version 2150 (0.0017) [2024-03-29 12:15:46,685][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 35241984. Throughput: 0: 40880.0. Samples: 35531060. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 12:15:46,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:50,889][00501] Updated weights for policy 0, policy_version 2160 (0.0031) [2024-03-29 12:15:51,685][00126] Fps is (10 sec: 37683.1, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 35422208. Throughput: 0: 41535.6. Samples: 35790000. Policy #0 lag: (min: 0.0, avg: 16.4, max: 41.0) [2024-03-29 12:15:51,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:15:54,106][00501] Updated weights for policy 0, policy_version 2170 (0.0027) [2024-03-29 12:15:56,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 35667968. Throughput: 0: 41256.9. Samples: 35905980. Policy #0 lag: (min: 1.0, avg: 22.2, max: 45.0) [2024-03-29 12:15:56,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:57,752][00501] Updated weights for policy 0, policy_version 2180 (0.0020) [2024-03-29 12:16:01,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 35848192. Throughput: 0: 40705.0. Samples: 36132600. Policy #0 lag: (min: 0.0, avg: 22.7, max: 44.0) [2024-03-29 12:16:01,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:16:02,168][00481] Signal inference workers to stop experience collection... (1350 times) [2024-03-29 12:16:02,219][00501] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-03-29 12:16:02,252][00481] Signal inference workers to resume experience collection... (1350 times) [2024-03-29 12:16:02,256][00501] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-03-29 12:16:02,258][00501] Updated weights for policy 0, policy_version 2190 (0.0015) [2024-03-29 12:16:06,685][00126] Fps is (10 sec: 36045.1, 60 sec: 40687.1, 300 sec: 41598.7). Total num frames: 36028416. Throughput: 0: 41900.6. Samples: 36420760. Policy #0 lag: (min: 0.0, avg: 15.4, max: 41.0) [2024-03-29 12:16:06,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:16:06,971][00501] Updated weights for policy 0, policy_version 2200 (0.0019) [2024-03-29 12:16:09,831][00501] Updated weights for policy 0, policy_version 2210 (0.0023) [2024-03-29 12:16:11,685][00126] Fps is (10 sec: 42598.4, 60 sec: 40413.9, 300 sec: 41598.7). Total num frames: 36274176. Throughput: 0: 41476.9. Samples: 36518260. Policy #0 lag: (min: 0.0, avg: 22.1, max: 44.0) [2024-03-29 12:16:11,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:16:13,707][00501] Updated weights for policy 0, policy_version 2220 (0.0028) [2024-03-29 12:16:16,686][00126] Fps is (10 sec: 44235.6, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 36470784. Throughput: 0: 40548.9. Samples: 36735420. Policy #0 lag: (min: 1.0, avg: 23.5, max: 44.0) [2024-03-29 12:16:16,687][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:16:18,198][00501] Updated weights for policy 0, policy_version 2230 (0.0032) [2024-03-29 12:16:21,686][00126] Fps is (10 sec: 34405.8, 60 sec: 40140.7, 300 sec: 41487.6). Total num frames: 36618240. Throughput: 0: 41870.0. Samples: 37041260. Policy #0 lag: (min: 0.0, avg: 15.4, max: 41.0) [2024-03-29 12:16:21,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:16:22,907][00501] Updated weights for policy 0, policy_version 2240 (0.0023) [2024-03-29 12:16:25,763][00501] Updated weights for policy 0, policy_version 2250 (0.0016) [2024-03-29 12:16:26,685][00126] Fps is (10 sec: 42599.1, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 36896768. Throughput: 0: 41264.0. Samples: 37137980. Policy #0 lag: (min: 1.0, avg: 21.4, max: 43.0) [2024-03-29 12:16:26,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:16:29,387][00501] Updated weights for policy 0, policy_version 2260 (0.0027) [2024-03-29 12:16:30,136][00481] Signal inference workers to stop experience collection... (1400 times) [2024-03-29 12:16:30,175][00501] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-03-29 12:16:30,358][00481] Signal inference workers to resume experience collection... (1400 times) [2024-03-29 12:16:30,358][00501] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-03-29 12:16:31,686][00126] Fps is (10 sec: 50790.4, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 37126144. Throughput: 0: 40653.2. Samples: 37360460. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 12:16:31,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:16:33,882][00501] Updated weights for policy 0, policy_version 2270 (0.0033) [2024-03-29 12:16:36,685][00126] Fps is (10 sec: 34406.2, 60 sec: 40413.8, 300 sec: 41487.6). Total num frames: 37240832. Throughput: 0: 41510.6. Samples: 37657980. Policy #0 lag: (min: 0.0, avg: 23.6, max: 45.0) [2024-03-29 12:16:36,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:16:38,706][00501] Updated weights for policy 0, policy_version 2280 (0.0022) [2024-03-29 12:16:41,664][00501] Updated weights for policy 0, policy_version 2290 (0.0018) [2024-03-29 12:16:41,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 37519360. Throughput: 0: 41572.8. Samples: 37776760. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 12:16:41,686][00126] Avg episode reward: [(0, '0.018')] [2024-03-29 12:16:41,714][00481] Saving new best policy, reward=0.018! [2024-03-29 12:16:45,653][00501] Updated weights for policy 0, policy_version 2300 (0.0032) [2024-03-29 12:16:46,685][00126] Fps is (10 sec: 50790.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 37748736. Throughput: 0: 41302.2. Samples: 37991200. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 12:16:46,687][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:16:49,912][00501] Updated weights for policy 0, policy_version 2310 (0.0018) [2024-03-29 12:16:51,685][00126] Fps is (10 sec: 36044.7, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 37879808. Throughput: 0: 40847.8. Samples: 38258920. Policy #0 lag: (min: 0.0, avg: 23.8, max: 42.0) [2024-03-29 12:16:51,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:16:51,929][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002313_37896192.pth... [2024-03-29 12:16:52,317][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001708_27983872.pth [2024-03-29 12:16:54,750][00501] Updated weights for policy 0, policy_version 2320 (0.0034) [2024-03-29 12:16:56,685][00126] Fps is (10 sec: 37682.8, 60 sec: 40959.9, 300 sec: 41598.7). Total num frames: 38125568. Throughput: 0: 41673.2. Samples: 38393560. Policy #0 lag: (min: 2.0, avg: 18.5, max: 42.0) [2024-03-29 12:16:56,686][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:16:57,519][00501] Updated weights for policy 0, policy_version 2330 (0.0022) [2024-03-29 12:16:58,720][00481] Signal inference workers to stop experience collection... (1450 times) [2024-03-29 12:16:58,803][00501] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-03-29 12:16:58,815][00481] Signal inference workers to resume experience collection... (1450 times) [2024-03-29 12:16:58,829][00501] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-03-29 12:17:01,441][00501] Updated weights for policy 0, policy_version 2340 (0.0031) [2024-03-29 12:17:01,686][00126] Fps is (10 sec: 45875.0, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 38338560. Throughput: 0: 41530.2. Samples: 38604280. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 12:17:01,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:17:05,627][00501] Updated weights for policy 0, policy_version 2350 (0.0018) [2024-03-29 12:17:06,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 38518784. Throughput: 0: 40543.2. Samples: 38865700. Policy #0 lag: (min: 2.0, avg: 25.0, max: 44.0) [2024-03-29 12:17:06,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:10,824][00501] Updated weights for policy 0, policy_version 2360 (0.0030) [2024-03-29 12:17:11,685][00126] Fps is (10 sec: 37683.5, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 38715392. Throughput: 0: 41887.5. Samples: 39022920. Policy #0 lag: (min: 2.0, avg: 16.6, max: 40.0) [2024-03-29 12:17:11,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:13,796][00501] Updated weights for policy 0, policy_version 2370 (0.0024) [2024-03-29 12:17:16,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 38961152. Throughput: 0: 41190.4. Samples: 39214020. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 12:17:16,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:17,358][00501] Updated weights for policy 0, policy_version 2380 (0.0024) [2024-03-29 12:17:21,629][00501] Updated weights for policy 0, policy_version 2390 (0.0021) [2024-03-29 12:17:21,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41598.7). Total num frames: 39157760. Throughput: 0: 40397.3. Samples: 39475860. Policy #0 lag: (min: 1.0, avg: 23.6, max: 41.0) [2024-03-29 12:17:21,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:17:26,685][00126] Fps is (10 sec: 34406.4, 60 sec: 40140.8, 300 sec: 41265.5). Total num frames: 39305216. Throughput: 0: 41221.0. Samples: 39631700. Policy #0 lag: (min: 1.0, avg: 14.9, max: 41.0) [2024-03-29 12:17:26,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:17:26,731][00501] Updated weights for policy 0, policy_version 2400 (0.0019) [2024-03-29 12:17:29,948][00501] Updated weights for policy 0, policy_version 2410 (0.0031) [2024-03-29 12:17:29,965][00481] Signal inference workers to stop experience collection... (1500 times) [2024-03-29 12:17:30,005][00501] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-03-29 12:17:30,191][00481] Signal inference workers to resume experience collection... (1500 times) [2024-03-29 12:17:30,191][00501] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-03-29 12:17:31,686][00126] Fps is (10 sec: 40959.9, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 39567360. Throughput: 0: 40898.5. Samples: 39831640. Policy #0 lag: (min: 2.0, avg: 22.2, max: 42.0) [2024-03-29 12:17:31,688][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:17:33,402][00501] Updated weights for policy 0, policy_version 2420 (0.0022) [2024-03-29 12:17:36,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41487.6). Total num frames: 39763968. Throughput: 0: 40578.3. Samples: 40084940. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 12:17:36,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:17:37,500][00501] Updated weights for policy 0, policy_version 2430 (0.0022) [2024-03-29 12:17:41,685][00126] Fps is (10 sec: 32768.3, 60 sec: 39594.7, 300 sec: 41154.4). Total num frames: 39895040. Throughput: 0: 40681.0. Samples: 40224200. Policy #0 lag: (min: 0.0, avg: 15.2, max: 42.0) [2024-03-29 12:17:41,687][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:42,943][00501] Updated weights for policy 0, policy_version 2440 (0.0027) [2024-03-29 12:17:45,846][00501] Updated weights for policy 0, policy_version 2450 (0.0035) [2024-03-29 12:17:46,685][00126] Fps is (10 sec: 40960.3, 60 sec: 40413.9, 300 sec: 41321.0). Total num frames: 40173568. Throughput: 0: 40994.9. Samples: 40449040. Policy #0 lag: (min: 2.0, avg: 20.7, max: 45.0) [2024-03-29 12:17:46,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:17:49,222][00501] Updated weights for policy 0, policy_version 2460 (0.0018) [2024-03-29 12:17:51,685][00126] Fps is (10 sec: 50790.7, 60 sec: 42052.4, 300 sec: 41543.4). Total num frames: 40402944. Throughput: 0: 40648.9. Samples: 40694900. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 12:17:51,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:17:53,178][00501] Updated weights for policy 0, policy_version 2470 (0.0019) [2024-03-29 12:17:56,685][00126] Fps is (10 sec: 34406.0, 60 sec: 39867.8, 300 sec: 41098.9). Total num frames: 40517632. Throughput: 0: 40411.6. Samples: 40841440. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 12:17:56,687][00126] Avg episode reward: [(0, '0.012')] [2024-03-29 12:17:58,892][00501] Updated weights for policy 0, policy_version 2480 (0.0027) [2024-03-29 12:18:01,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.1, 300 sec: 41209.9). Total num frames: 40779776. Throughput: 0: 41467.6. Samples: 41080060. Policy #0 lag: (min: 3.0, avg: 20.9, max: 44.0) [2024-03-29 12:18:01,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:18:01,850][00501] Updated weights for policy 0, policy_version 2490 (0.0024) [2024-03-29 12:18:02,166][00481] Signal inference workers to stop experience collection... (1550 times) [2024-03-29 12:18:02,167][00481] Signal inference workers to resume experience collection... (1550 times) [2024-03-29 12:18:02,208][00501] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-03-29 12:18:02,208][00501] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-03-29 12:18:05,311][00501] Updated weights for policy 0, policy_version 2500 (0.0020) [2024-03-29 12:18:06,685][00126] Fps is (10 sec: 50790.8, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 41025536. Throughput: 0: 40771.2. Samples: 41310560. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 12:18:06,686][00126] Avg episode reward: [(0, '0.018')] [2024-03-29 12:18:09,451][00501] Updated weights for policy 0, policy_version 2510 (0.0018) [2024-03-29 12:18:11,685][00126] Fps is (10 sec: 39321.4, 60 sec: 40960.0, 300 sec: 41265.5). Total num frames: 41172992. Throughput: 0: 40329.3. Samples: 41446520. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 12:18:11,687][00126] Avg episode reward: [(0, '0.023')] [2024-03-29 12:18:11,703][00481] Saving new best policy, reward=0.023! [2024-03-29 12:18:14,706][00501] Updated weights for policy 0, policy_version 2520 (0.0024) [2024-03-29 12:18:16,685][00126] Fps is (10 sec: 37683.4, 60 sec: 40687.0, 300 sec: 41209.9). Total num frames: 41402368. Throughput: 0: 41706.9. Samples: 41708440. Policy #0 lag: (min: 0.0, avg: 14.4, max: 41.0) [2024-03-29 12:18:16,686][00126] Avg episode reward: [(0, '0.012')] [2024-03-29 12:18:17,599][00501] Updated weights for policy 0, policy_version 2530 (0.0023) [2024-03-29 12:18:21,275][00501] Updated weights for policy 0, policy_version 2540 (0.0020) [2024-03-29 12:18:21,685][00126] Fps is (10 sec: 45875.6, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 41631744. Throughput: 0: 40976.5. Samples: 41928880. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:18:21,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:18:25,556][00501] Updated weights for policy 0, policy_version 2550 (0.0020) [2024-03-29 12:18:26,685][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41321.0). Total num frames: 41811968. Throughput: 0: 40610.7. Samples: 42051680. Policy #0 lag: (min: 1.0, avg: 22.8, max: 41.0) [2024-03-29 12:18:26,686][00126] Avg episode reward: [(0, '0.015')] [2024-03-29 12:18:30,762][00501] Updated weights for policy 0, policy_version 2560 (0.0030) [2024-03-29 12:18:31,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40414.0, 300 sec: 41043.3). Total num frames: 41992192. Throughput: 0: 41969.3. Samples: 42337660. Policy #0 lag: (min: 0.0, avg: 14.5, max: 41.0) [2024-03-29 12:18:31,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:18:32,234][00481] Signal inference workers to stop experience collection... (1600 times) [2024-03-29 12:18:32,291][00501] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-03-29 12:18:32,323][00481] Signal inference workers to resume experience collection... (1600 times) [2024-03-29 12:18:32,327][00501] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-03-29 12:18:33,415][00501] Updated weights for policy 0, policy_version 2570 (0.0028) [2024-03-29 12:18:36,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41376.5). Total num frames: 42237952. Throughput: 0: 41224.9. Samples: 42550020. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:18:36,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:18:37,225][00501] Updated weights for policy 0, policy_version 2580 (0.0030) [2024-03-29 12:18:41,365][00501] Updated weights for policy 0, policy_version 2590 (0.0026) [2024-03-29 12:18:41,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41209.9). Total num frames: 42434560. Throughput: 0: 40511.2. Samples: 42664440. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 12:18:41,686][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:18:46,534][00501] Updated weights for policy 0, policy_version 2600 (0.0030) [2024-03-29 12:18:46,685][00126] Fps is (10 sec: 36044.5, 60 sec: 40413.8, 300 sec: 40987.8). Total num frames: 42598400. Throughput: 0: 41967.0. Samples: 42968580. Policy #0 lag: (min: 0.0, avg: 15.3, max: 41.0) [2024-03-29 12:18:46,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:18:49,304][00501] Updated weights for policy 0, policy_version 2610 (0.0027) [2024-03-29 12:18:51,685][00126] Fps is (10 sec: 42598.0, 60 sec: 40959.9, 300 sec: 41265.5). Total num frames: 42860544. Throughput: 0: 41059.5. Samples: 43158240. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 12:18:51,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:18:51,709][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002616_42860544.pth... [2024-03-29 12:18:52,175][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002014_32997376.pth [2024-03-29 12:18:53,227][00501] Updated weights for policy 0, policy_version 2620 (0.0025) [2024-03-29 12:18:56,685][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41209.9). Total num frames: 43057152. Throughput: 0: 40713.4. Samples: 43278620. Policy #0 lag: (min: 0.0, avg: 23.7, max: 42.0) [2024-03-29 12:18:56,688][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:18:57,208][00501] Updated weights for policy 0, policy_version 2630 (0.0017) [2024-03-29 12:19:01,685][00126] Fps is (10 sec: 32768.5, 60 sec: 40140.8, 300 sec: 40932.2). Total num frames: 43188224. Throughput: 0: 41459.6. Samples: 43574120. Policy #0 lag: (min: 0.0, avg: 16.1, max: 41.0) [2024-03-29 12:19:01,686][00126] Avg episode reward: [(0, '0.012')] [2024-03-29 12:19:02,567][00501] Updated weights for policy 0, policy_version 2640 (0.0026) [2024-03-29 12:19:03,367][00481] Signal inference workers to stop experience collection... (1650 times) [2024-03-29 12:19:03,368][00481] Signal inference workers to resume experience collection... (1650 times) [2024-03-29 12:19:03,408][00501] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-03-29 12:19:03,408][00501] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-03-29 12:19:05,286][00501] Updated weights for policy 0, policy_version 2650 (0.0019) [2024-03-29 12:19:06,685][00126] Fps is (10 sec: 42598.3, 60 sec: 40960.0, 300 sec: 41321.0). Total num frames: 43483136. Throughput: 0: 41155.5. Samples: 43780880. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 12:19:06,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:19:09,083][00501] Updated weights for policy 0, policy_version 2660 (0.0033) [2024-03-29 12:19:11,685][00126] Fps is (10 sec: 50790.0, 60 sec: 42052.3, 300 sec: 41321.0). Total num frames: 43696128. Throughput: 0: 41079.1. Samples: 43900240. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 12:19:11,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:19:12,906][00501] Updated weights for policy 0, policy_version 2670 (0.0036) [2024-03-29 12:19:16,685][00126] Fps is (10 sec: 34406.4, 60 sec: 40413.8, 300 sec: 40987.8). Total num frames: 43827200. Throughput: 0: 41213.8. Samples: 44192280. Policy #0 lag: (min: 0.0, avg: 15.2, max: 41.0) [2024-03-29 12:19:16,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:19:18,211][00501] Updated weights for policy 0, policy_version 2680 (0.0022) [2024-03-29 12:19:20,994][00501] Updated weights for policy 0, policy_version 2690 (0.0028) [2024-03-29 12:19:21,685][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.0, 300 sec: 41209.9). Total num frames: 44089344. Throughput: 0: 41375.5. Samples: 44411920. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:19:21,686][00126] Avg episode reward: [(0, '0.033')] [2024-03-29 12:19:22,084][00481] Saving new best policy, reward=0.033! [2024-03-29 12:19:24,925][00501] Updated weights for policy 0, policy_version 2700 (0.0021) [2024-03-29 12:19:26,685][00126] Fps is (10 sec: 47513.1, 60 sec: 41506.1, 300 sec: 41265.5). Total num frames: 44302336. Throughput: 0: 41488.8. Samples: 44531440. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 12:19:26,688][00126] Avg episode reward: [(0, '0.021')] [2024-03-29 12:19:28,803][00501] Updated weights for policy 0, policy_version 2710 (0.0023) [2024-03-29 12:19:31,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 40932.2). Total num frames: 44449792. Throughput: 0: 40711.6. Samples: 44800600. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 12:19:31,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:19:33,948][00501] Updated weights for policy 0, policy_version 2720 (0.0024) [2024-03-29 12:19:34,556][00481] Signal inference workers to stop experience collection... (1700 times) [2024-03-29 12:19:34,557][00481] Signal inference workers to resume experience collection... (1700 times) [2024-03-29 12:19:34,605][00501] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-03-29 12:19:34,605][00501] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-03-29 12:19:36,686][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.0, 300 sec: 41209.9). Total num frames: 44711936. Throughput: 0: 41708.8. Samples: 45035140. Policy #0 lag: (min: 1.0, avg: 22.2, max: 44.0) [2024-03-29 12:19:36,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:19:36,714][00501] Updated weights for policy 0, policy_version 2730 (0.0024) [2024-03-29 12:19:40,608][00501] Updated weights for policy 0, policy_version 2740 (0.0022) [2024-03-29 12:19:41,685][00126] Fps is (10 sec: 49152.0, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 44941312. Throughput: 0: 41723.1. Samples: 45156160. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 12:19:41,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:19:44,360][00501] Updated weights for policy 0, policy_version 2750 (0.0023) [2024-03-29 12:19:46,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.2, 300 sec: 41043.3). Total num frames: 45088768. Throughput: 0: 40903.0. Samples: 45414760. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 12:19:46,687][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:19:49,672][00501] Updated weights for policy 0, policy_version 2760 (0.0024) [2024-03-29 12:19:51,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.1, 300 sec: 41154.4). Total num frames: 45334528. Throughput: 0: 41891.9. Samples: 45666020. Policy #0 lag: (min: 1.0, avg: 15.6, max: 41.0) [2024-03-29 12:19:51,686][00126] Avg episode reward: [(0, '0.022')] [2024-03-29 12:19:52,645][00501] Updated weights for policy 0, policy_version 2770 (0.0026) [2024-03-29 12:19:56,483][00501] Updated weights for policy 0, policy_version 2780 (0.0032) [2024-03-29 12:19:56,685][00126] Fps is (10 sec: 45875.5, 60 sec: 41506.2, 300 sec: 41432.1). Total num frames: 45547520. Throughput: 0: 41668.5. Samples: 45775320. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 12:19:56,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:20:00,101][00501] Updated weights for policy 0, policy_version 2790 (0.0025) [2024-03-29 12:20:01,685][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.3, 300 sec: 41209.9). Total num frames: 45744128. Throughput: 0: 40805.8. Samples: 46028540. Policy #0 lag: (min: 0.0, avg: 22.0, max: 40.0) [2024-03-29 12:20:01,688][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:20:05,556][00501] Updated weights for policy 0, policy_version 2800 (0.0019) [2024-03-29 12:20:05,662][00481] Signal inference workers to stop experience collection... (1750 times) [2024-03-29 12:20:05,699][00501] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-03-29 12:20:05,826][00481] Signal inference workers to resume experience collection... (1750 times) [2024-03-29 12:20:05,826][00501] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-03-29 12:20:06,685][00126] Fps is (10 sec: 39321.1, 60 sec: 40960.0, 300 sec: 40987.8). Total num frames: 45940736. Throughput: 0: 41914.6. Samples: 46298080. Policy #0 lag: (min: 0.0, avg: 16.3, max: 42.0) [2024-03-29 12:20:06,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:20:08,340][00501] Updated weights for policy 0, policy_version 2810 (0.0032) [2024-03-29 12:20:11,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 46170112. Throughput: 0: 41438.3. Samples: 46396160. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:20:11,686][00126] Avg episode reward: [(0, '0.018')] [2024-03-29 12:20:12,312][00501] Updated weights for policy 0, policy_version 2820 (0.0021) [2024-03-29 12:20:15,971][00501] Updated weights for policy 0, policy_version 2830 (0.0023) [2024-03-29 12:20:16,685][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 41265.5). Total num frames: 46383104. Throughput: 0: 41209.8. Samples: 46655040. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 12:20:16,686][00126] Avg episode reward: [(0, '0.030')] [2024-03-29 12:20:21,139][00501] Updated weights for policy 0, policy_version 2840 (0.0025) [2024-03-29 12:20:21,685][00126] Fps is (10 sec: 37683.3, 60 sec: 40960.0, 300 sec: 41043.3). Total num frames: 46546944. Throughput: 0: 42125.9. Samples: 46930800. Policy #0 lag: (min: 1.0, avg: 16.5, max: 41.0) [2024-03-29 12:20:21,686][00126] Avg episode reward: [(0, '0.026')] [2024-03-29 12:20:23,929][00501] Updated weights for policy 0, policy_version 2850 (0.0024) [2024-03-29 12:20:26,686][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 46809088. Throughput: 0: 41706.6. Samples: 47032960. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 12:20:26,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:20:27,767][00501] Updated weights for policy 0, policy_version 2860 (0.0022) [2024-03-29 12:20:31,685][00126] Fps is (10 sec: 45875.6, 60 sec: 42598.5, 300 sec: 41321.0). Total num frames: 47005696. Throughput: 0: 41609.4. Samples: 47287180. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 12:20:31,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:20:31,773][00501] Updated weights for policy 0, policy_version 2870 (0.0020) [2024-03-29 12:20:36,215][00481] Signal inference workers to stop experience collection... (1800 times) [2024-03-29 12:20:36,250][00501] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-03-29 12:20:36,430][00481] Signal inference workers to resume experience collection... (1800 times) [2024-03-29 12:20:36,430][00501] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-03-29 12:20:36,686][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 41098.8). Total num frames: 47169536. Throughput: 0: 42324.4. Samples: 47570620. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 12:20:36,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:20:36,731][00501] Updated weights for policy 0, policy_version 2880 (0.0019) [2024-03-29 12:20:39,511][00501] Updated weights for policy 0, policy_version 2890 (0.0022) [2024-03-29 12:20:41,685][00126] Fps is (10 sec: 44235.9, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 47448064. Throughput: 0: 42112.7. Samples: 47670400. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 12:20:41,686][00126] Avg episode reward: [(0, '0.025')] [2024-03-29 12:20:43,334][00501] Updated weights for policy 0, policy_version 2900 (0.0027) [2024-03-29 12:20:46,685][00126] Fps is (10 sec: 49152.6, 60 sec: 42871.5, 300 sec: 41487.6). Total num frames: 47661056. Throughput: 0: 42069.8. Samples: 47921680. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 12:20:46,687][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:20:47,333][00501] Updated weights for policy 0, policy_version 2910 (0.0033) [2024-03-29 12:20:51,686][00126] Fps is (10 sec: 36044.5, 60 sec: 41233.0, 300 sec: 41154.4). Total num frames: 47808512. Throughput: 0: 42390.1. Samples: 48205640. Policy #0 lag: (min: 1.0, avg: 17.2, max: 41.0) [2024-03-29 12:20:51,686][00126] Avg episode reward: [(0, '0.032')] [2024-03-29 12:20:51,876][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002919_47824896.pth... [2024-03-29 12:20:52,265][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002313_37896192.pth [2024-03-29 12:20:52,539][00501] Updated weights for policy 0, policy_version 2920 (0.0024) [2024-03-29 12:20:55,300][00501] Updated weights for policy 0, policy_version 2930 (0.0023) [2024-03-29 12:20:56,685][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 41432.1). Total num frames: 48070656. Throughput: 0: 42126.7. Samples: 48291860. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 12:20:56,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:20:59,189][00501] Updated weights for policy 0, policy_version 2940 (0.0023) [2024-03-29 12:21:01,685][00126] Fps is (10 sec: 45875.8, 60 sec: 42052.2, 300 sec: 41487.6). Total num frames: 48267264. Throughput: 0: 41992.4. Samples: 48544700. Policy #0 lag: (min: 0.0, avg: 22.6, max: 41.0) [2024-03-29 12:21:01,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:21:02,946][00501] Updated weights for policy 0, policy_version 2950 (0.0026) [2024-03-29 12:21:05,669][00481] Signal inference workers to stop experience collection... (1850 times) [2024-03-29 12:21:05,743][00501] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-03-29 12:21:05,748][00481] Signal inference workers to resume experience collection... (1850 times) [2024-03-29 12:21:05,771][00501] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-03-29 12:21:06,685][00126] Fps is (10 sec: 34406.5, 60 sec: 41233.1, 300 sec: 41154.4). Total num frames: 48414720. Throughput: 0: 42088.9. Samples: 48824800. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 12:21:06,686][00126] Avg episode reward: [(0, '0.025')] [2024-03-29 12:21:07,996][00501] Updated weights for policy 0, policy_version 2960 (0.0019) [2024-03-29 12:21:10,882][00501] Updated weights for policy 0, policy_version 2970 (0.0028) [2024-03-29 12:21:11,685][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41432.1). Total num frames: 48693248. Throughput: 0: 41971.2. Samples: 48921660. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 12:21:11,686][00126] Avg episode reward: [(0, '0.021')] [2024-03-29 12:21:14,872][00501] Updated weights for policy 0, policy_version 2980 (0.0022) [2024-03-29 12:21:16,685][00126] Fps is (10 sec: 47513.5, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 48889856. Throughput: 0: 41865.7. Samples: 49171140. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 12:21:16,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:21:18,422][00501] Updated weights for policy 0, policy_version 2990 (0.0024) [2024-03-29 12:21:21,688][00126] Fps is (10 sec: 36036.8, 60 sec: 41777.6, 300 sec: 41209.6). Total num frames: 49053696. Throughput: 0: 41706.9. Samples: 49447520. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:21:21,689][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:21:23,578][00501] Updated weights for policy 0, policy_version 3000 (0.0033) [2024-03-29 12:21:26,535][00501] Updated weights for policy 0, policy_version 3010 (0.0023) [2024-03-29 12:21:26,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41321.0). Total num frames: 49315840. Throughput: 0: 42148.5. Samples: 49567080. Policy #0 lag: (min: 1.0, avg: 17.0, max: 42.0) [2024-03-29 12:21:26,686][00126] Avg episode reward: [(0, '0.023')] [2024-03-29 12:21:30,672][00501] Updated weights for policy 0, policy_version 3020 (0.0028) [2024-03-29 12:21:31,685][00126] Fps is (10 sec: 47524.4, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 49528832. Throughput: 0: 41572.9. Samples: 49792460. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 12:21:31,686][00126] Avg episode reward: [(0, '0.025')] [2024-03-29 12:21:33,114][00481] Signal inference workers to stop experience collection... (1900 times) [2024-03-29 12:21:33,136][00501] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-03-29 12:21:33,330][00481] Signal inference workers to resume experience collection... (1900 times) [2024-03-29 12:21:33,331][00501] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-03-29 12:21:34,128][00501] Updated weights for policy 0, policy_version 3030 (0.0017) [2024-03-29 12:21:36,686][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 41265.5). Total num frames: 49692672. Throughput: 0: 41247.6. Samples: 50061780. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 12:21:36,686][00126] Avg episode reward: [(0, '0.031')] [2024-03-29 12:21:39,403][00501] Updated weights for policy 0, policy_version 3040 (0.0022) [2024-03-29 12:21:41,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.2, 300 sec: 41265.5). Total num frames: 49922048. Throughput: 0: 42265.0. Samples: 50193780. Policy #0 lag: (min: 1.0, avg: 17.9, max: 43.0) [2024-03-29 12:21:41,686][00126] Avg episode reward: [(0, '0.022')] [2024-03-29 12:21:42,237][00501] Updated weights for policy 0, policy_version 3050 (0.0027) [2024-03-29 12:21:46,323][00501] Updated weights for policy 0, policy_version 3060 (0.0023) [2024-03-29 12:21:46,685][00126] Fps is (10 sec: 45875.9, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 50151424. Throughput: 0: 41574.7. Samples: 50415560. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 12:21:46,686][00126] Avg episode reward: [(0, '0.020')] [2024-03-29 12:21:49,863][00501] Updated weights for policy 0, policy_version 3070 (0.0027) [2024-03-29 12:21:51,685][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.4, 300 sec: 41376.6). Total num frames: 50331648. Throughput: 0: 41217.7. Samples: 50679600. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 12:21:51,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:21:55,039][00501] Updated weights for policy 0, policy_version 3080 (0.0024) [2024-03-29 12:21:56,685][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 50544640. Throughput: 0: 42245.0. Samples: 50822680. Policy #0 lag: (min: 0.0, avg: 18.0, max: 41.0) [2024-03-29 12:21:56,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:21:57,959][00501] Updated weights for policy 0, policy_version 3090 (0.0025) [2024-03-29 12:22:01,686][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 50774016. Throughput: 0: 41697.2. Samples: 51047520. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 12:22:01,686][00126] Avg episode reward: [(0, '0.020')] [2024-03-29 12:22:01,924][00501] Updated weights for policy 0, policy_version 3100 (0.0017) [2024-03-29 12:22:04,561][00481] Signal inference workers to stop experience collection... (1950 times) [2024-03-29 12:22:04,583][00501] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-03-29 12:22:04,776][00481] Signal inference workers to resume experience collection... (1950 times) [2024-03-29 12:22:04,777][00501] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-03-29 12:22:05,596][00501] Updated weights for policy 0, policy_version 3110 (0.0027) [2024-03-29 12:22:06,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42871.4, 300 sec: 41598.7). Total num frames: 50987008. Throughput: 0: 41123.4. Samples: 51297980. Policy #0 lag: (min: 2.0, avg: 22.2, max: 43.0) [2024-03-29 12:22:06,686][00126] Avg episode reward: [(0, '0.031')] [2024-03-29 12:22:10,627][00501] Updated weights for policy 0, policy_version 3120 (0.0030) [2024-03-29 12:22:11,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41376.5). Total num frames: 51167232. Throughput: 0: 41978.7. Samples: 51456120. Policy #0 lag: (min: 1.0, avg: 16.8, max: 41.0) [2024-03-29 12:22:11,686][00126] Avg episode reward: [(0, '0.022')] [2024-03-29 12:22:13,620][00501] Updated weights for policy 0, policy_version 3130 (0.0019) [2024-03-29 12:22:16,685][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41543.2). Total num frames: 51412992. Throughput: 0: 41665.3. Samples: 51667400. Policy #0 lag: (min: 1.0, avg: 22.7, max: 45.0) [2024-03-29 12:22:16,688][00126] Avg episode reward: [(0, '0.034')] [2024-03-29 12:22:16,689][00481] Saving new best policy, reward=0.034! [2024-03-29 12:22:17,623][00501] Updated weights for policy 0, policy_version 3140 (0.0021) [2024-03-29 12:22:21,320][00501] Updated weights for policy 0, policy_version 3150 (0.0028) [2024-03-29 12:22:21,685][00126] Fps is (10 sec: 45875.2, 60 sec: 42873.1, 300 sec: 41765.3). Total num frames: 51625984. Throughput: 0: 41452.1. Samples: 51927120. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:22:21,686][00126] Avg episode reward: [(0, '0.031')] [2024-03-29 12:22:26,070][00501] Updated weights for policy 0, policy_version 3160 (0.0019) [2024-03-29 12:22:26,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 51789824. Throughput: 0: 42023.6. Samples: 52084840. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 12:22:26,686][00126] Avg episode reward: [(0, '0.023')] [2024-03-29 12:22:29,111][00501] Updated weights for policy 0, policy_version 3170 (0.0022) [2024-03-29 12:22:31,686][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 52035584. Throughput: 0: 41978.6. Samples: 52304600. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 12:22:31,686][00126] Avg episode reward: [(0, '0.020')] [2024-03-29 12:22:33,218][00501] Updated weights for policy 0, policy_version 3180 (0.0024) [2024-03-29 12:22:36,142][00481] Signal inference workers to stop experience collection... (2000 times) [2024-03-29 12:22:36,142][00481] Signal inference workers to resume experience collection... (2000 times) [2024-03-29 12:22:36,182][00501] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-03-29 12:22:36,182][00501] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-03-29 12:22:36,685][00126] Fps is (10 sec: 45874.5, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 52248576. Throughput: 0: 41934.2. Samples: 52566640. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 12:22:36,686][00126] Avg episode reward: [(0, '0.033')] [2024-03-29 12:22:36,720][00501] Updated weights for policy 0, policy_version 3190 (0.0018) [2024-03-29 12:22:41,494][00501] Updated weights for policy 0, policy_version 3200 (0.0017) [2024-03-29 12:22:41,685][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 52428800. Throughput: 0: 42054.2. Samples: 52715120. Policy #0 lag: (min: 0.0, avg: 18.0, max: 41.0) [2024-03-29 12:22:41,686][00126] Avg episode reward: [(0, '0.027')] [2024-03-29 12:22:44,568][00501] Updated weights for policy 0, policy_version 3210 (0.0025) [2024-03-29 12:22:46,685][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 52674560. Throughput: 0: 41897.4. Samples: 52932900. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 12:22:46,686][00126] Avg episode reward: [(0, '0.021')] [2024-03-29 12:22:48,784][00501] Updated weights for policy 0, policy_version 3220 (0.0032) [2024-03-29 12:22:51,685][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 52871168. Throughput: 0: 42240.1. Samples: 53198780. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:22:51,686][00126] Avg episode reward: [(0, '0.035')] [2024-03-29 12:22:52,224][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003229_52903936.pth... [2024-03-29 12:22:52,586][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002616_42860544.pth [2024-03-29 12:22:52,848][00501] Updated weights for policy 0, policy_version 3230 (0.0025) [2024-03-29 12:22:56,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 53035008. Throughput: 0: 41473.3. Samples: 53322420. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 12:22:56,687][00126] Avg episode reward: [(0, '0.028')] [2024-03-29 12:22:57,362][00501] Updated weights for policy 0, policy_version 3240 (0.0023) [2024-03-29 12:23:00,652][00501] Updated weights for policy 0, policy_version 3250 (0.0025) [2024-03-29 12:23:01,685][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 53280768. Throughput: 0: 41704.5. Samples: 53544100. Policy #0 lag: (min: 0.0, avg: 22.9, max: 46.0) [2024-03-29 12:23:01,686][00126] Avg episode reward: [(0, '0.034')] [2024-03-29 12:23:04,831][00501] Updated weights for policy 0, policy_version 3260 (0.0027) [2024-03-29 12:23:06,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 53477376. Throughput: 0: 41965.8. Samples: 53815580. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 12:23:06,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:23:06,971][00481] Saving new best policy, reward=0.038! [2024-03-29 12:23:07,711][00481] Signal inference workers to stop experience collection... (2050 times) [2024-03-29 12:23:07,733][00501] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-03-29 12:23:07,917][00481] Signal inference workers to resume experience collection... (2050 times) [2024-03-29 12:23:07,918][00501] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-03-29 12:23:08,712][00501] Updated weights for policy 0, policy_version 3270 (0.0026) [2024-03-29 12:23:11,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 53641216. Throughput: 0: 40946.6. Samples: 53927440. Policy #0 lag: (min: 0.0, avg: 20.4, max: 43.0) [2024-03-29 12:23:11,686][00126] Avg episode reward: [(0, '0.042')] [2024-03-29 12:23:11,706][00481] Saving new best policy, reward=0.042! [2024-03-29 12:23:13,707][00501] Updated weights for policy 0, policy_version 3280 (0.0019) [2024-03-29 12:23:16,615][00501] Updated weights for policy 0, policy_version 3290 (0.0022) [2024-03-29 12:23:16,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 53903360. Throughput: 0: 41515.7. Samples: 54172800. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 12:23:16,686][00126] Avg episode reward: [(0, '0.047')] [2024-03-29 12:23:16,901][00481] Saving new best policy, reward=0.047! [2024-03-29 12:23:20,859][00501] Updated weights for policy 0, policy_version 3300 (0.0019) [2024-03-29 12:23:21,685][00126] Fps is (10 sec: 45875.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 54099968. Throughput: 0: 41116.9. Samples: 54416900. Policy #0 lag: (min: 0.0, avg: 23.4, max: 42.0) [2024-03-29 12:23:21,686][00126] Avg episode reward: [(0, '0.037')] [2024-03-29 12:23:24,482][00501] Updated weights for policy 0, policy_version 3310 (0.0026) [2024-03-29 12:23:26,686][00126] Fps is (10 sec: 37682.6, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 54280192. Throughput: 0: 40655.9. Samples: 54544640. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 12:23:26,686][00126] Avg episode reward: [(0, '0.042')] [2024-03-29 12:23:29,507][00501] Updated weights for policy 0, policy_version 3320 (0.0019) [2024-03-29 12:23:31,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 54509568. Throughput: 0: 41697.3. Samples: 54809280. Policy #0 lag: (min: 0.0, avg: 16.6, max: 41.0) [2024-03-29 12:23:31,686][00126] Avg episode reward: [(0, '0.039')] [2024-03-29 12:23:32,379][00501] Updated weights for policy 0, policy_version 3330 (0.0023) [2024-03-29 12:23:36,506][00501] Updated weights for policy 0, policy_version 3340 (0.0023) [2024-03-29 12:23:36,685][00126] Fps is (10 sec: 44237.8, 60 sec: 41233.2, 300 sec: 41654.2). Total num frames: 54722560. Throughput: 0: 41108.0. Samples: 55048640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 12:23:36,686][00126] Avg episode reward: [(0, '0.032')] [2024-03-29 12:23:38,271][00481] Signal inference workers to stop experience collection... (2100 times) [2024-03-29 12:23:38,316][00501] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-03-29 12:23:38,468][00481] Signal inference workers to resume experience collection... (2100 times) [2024-03-29 12:23:38,469][00501] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-03-29 12:23:40,373][00501] Updated weights for policy 0, policy_version 3350 (0.0020) [2024-03-29 12:23:41,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 54919168. Throughput: 0: 41152.9. Samples: 55174300. Policy #0 lag: (min: 0.0, avg: 21.1, max: 40.0) [2024-03-29 12:23:41,686][00126] Avg episode reward: [(0, '0.046')] [2024-03-29 12:23:45,088][00501] Updated weights for policy 0, policy_version 3360 (0.0030) [2024-03-29 12:23:46,685][00126] Fps is (10 sec: 39321.1, 60 sec: 40686.9, 300 sec: 41543.2). Total num frames: 55115776. Throughput: 0: 42033.3. Samples: 55435600. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 12:23:46,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:23:48,154][00501] Updated weights for policy 0, policy_version 3370 (0.0019) [2024-03-29 12:23:51,686][00126] Fps is (10 sec: 42597.8, 60 sec: 41232.9, 300 sec: 41654.2). Total num frames: 55345152. Throughput: 0: 41268.3. Samples: 55672660. Policy #0 lag: (min: 1.0, avg: 23.4, max: 41.0) [2024-03-29 12:23:51,686][00126] Avg episode reward: [(0, '0.035')] [2024-03-29 12:23:52,148][00501] Updated weights for policy 0, policy_version 3380 (0.0025) [2024-03-29 12:23:56,065][00501] Updated weights for policy 0, policy_version 3390 (0.0020) [2024-03-29 12:23:56,685][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 55558144. Throughput: 0: 41745.8. Samples: 55806000. Policy #0 lag: (min: 0.0, avg: 20.7, max: 45.0) [2024-03-29 12:23:56,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:24:00,684][00501] Updated weights for policy 0, policy_version 3400 (0.0023) [2024-03-29 12:24:01,685][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 55754752. Throughput: 0: 42193.3. Samples: 56071500. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 12:24:01,686][00126] Avg episode reward: [(0, '0.068')] [2024-03-29 12:24:01,710][00481] Saving new best policy, reward=0.068! [2024-03-29 12:24:03,844][00501] Updated weights for policy 0, policy_version 3410 (0.0031) [2024-03-29 12:24:06,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 55967744. Throughput: 0: 41713.4. Samples: 56294000. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 12:24:06,686][00126] Avg episode reward: [(0, '0.036')] [2024-03-29 12:24:07,859][00501] Updated weights for policy 0, policy_version 3420 (0.0023) [2024-03-29 12:24:10,615][00481] Signal inference workers to stop experience collection... (2150 times) [2024-03-29 12:24:10,616][00481] Signal inference workers to resume experience collection... (2150 times) [2024-03-29 12:24:10,658][00501] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-03-29 12:24:10,658][00501] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-03-29 12:24:11,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 56180736. Throughput: 0: 41953.9. Samples: 56432560. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:24:11,686][00126] Avg episode reward: [(0, '0.051')] [2024-03-29 12:24:11,853][00501] Updated weights for policy 0, policy_version 3430 (0.0023) [2024-03-29 12:24:16,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40413.9, 300 sec: 41487.6). Total num frames: 56328192. Throughput: 0: 41743.2. Samples: 56687720. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 12:24:16,686][00126] Avg episode reward: [(0, '0.053')] [2024-03-29 12:24:17,083][00501] Updated weights for policy 0, policy_version 3440 (0.0029) [2024-03-29 12:24:20,073][00501] Updated weights for policy 0, policy_version 3450 (0.0020) [2024-03-29 12:24:21,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 56590336. Throughput: 0: 40918.2. Samples: 56889960. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 12:24:21,686][00126] Avg episode reward: [(0, '0.040')] [2024-03-29 12:24:24,211][00501] Updated weights for policy 0, policy_version 3460 (0.0022) [2024-03-29 12:24:26,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.3, 300 sec: 41765.3). Total num frames: 56770560. Throughput: 0: 41246.7. Samples: 57030400. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 12:24:26,686][00126] Avg episode reward: [(0, '0.040')] [2024-03-29 12:24:27,954][00501] Updated weights for policy 0, policy_version 3470 (0.0029) [2024-03-29 12:24:31,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 56967168. Throughput: 0: 41330.7. Samples: 57295480. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 12:24:31,688][00126] Avg episode reward: [(0, '0.040')] [2024-03-29 12:24:32,528][00501] Updated weights for policy 0, policy_version 3480 (0.0031) [2024-03-29 12:24:35,483][00501] Updated weights for policy 0, policy_version 3490 (0.0018) [2024-03-29 12:24:36,685][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 57229312. Throughput: 0: 41239.7. Samples: 57528440. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 12:24:36,686][00126] Avg episode reward: [(0, '0.051')] [2024-03-29 12:24:39,733][00501] Updated weights for policy 0, policy_version 3500 (0.0024) [2024-03-29 12:24:41,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 57409536. Throughput: 0: 41414.7. Samples: 57669660. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 12:24:41,686][00126] Avg episode reward: [(0, '0.048')] [2024-03-29 12:24:43,255][00501] Updated weights for policy 0, policy_version 3510 (0.0023) [2024-03-29 12:24:43,883][00481] Signal inference workers to stop experience collection... (2200 times) [2024-03-29 12:24:43,883][00481] Signal inference workers to resume experience collection... (2200 times) [2024-03-29 12:24:43,916][00501] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-03-29 12:24:43,916][00501] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-03-29 12:24:46,685][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 57606144. Throughput: 0: 41547.1. Samples: 57941120. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:24:46,686][00126] Avg episode reward: [(0, '0.043')] [2024-03-29 12:24:47,900][00501] Updated weights for policy 0, policy_version 3520 (0.0023) [2024-03-29 12:24:50,749][00501] Updated weights for policy 0, policy_version 3530 (0.0027) [2024-03-29 12:24:51,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 57868288. Throughput: 0: 41791.1. Samples: 58174600. Policy #0 lag: (min: 1.0, avg: 18.7, max: 42.0) [2024-03-29 12:24:51,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:24:51,861][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003533_57884672.pth... [2024-03-29 12:24:52,214][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002919_47824896.pth [2024-03-29 12:24:55,060][00501] Updated weights for policy 0, policy_version 3540 (0.0021) [2024-03-29 12:24:56,686][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 58048512. Throughput: 0: 41587.9. Samples: 58304020. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 12:24:56,686][00126] Avg episode reward: [(0, '0.037')] [2024-03-29 12:24:58,946][00501] Updated weights for policy 0, policy_version 3550 (0.0019) [2024-03-29 12:25:01,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 58228736. Throughput: 0: 41757.7. Samples: 58566820. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 12:25:01,688][00126] Avg episode reward: [(0, '0.047')] [2024-03-29 12:25:03,340][00501] Updated weights for policy 0, policy_version 3560 (0.0019) [2024-03-29 12:25:06,410][00501] Updated weights for policy 0, policy_version 3570 (0.0035) [2024-03-29 12:25:06,685][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 58490880. Throughput: 0: 42407.6. Samples: 58798300. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 12:25:06,686][00126] Avg episode reward: [(0, '0.056')] [2024-03-29 12:25:10,840][00501] Updated weights for policy 0, policy_version 3580 (0.0023) [2024-03-29 12:25:11,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 58687488. Throughput: 0: 42312.3. Samples: 58934460. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 12:25:11,686][00126] Avg episode reward: [(0, '0.053')] [2024-03-29 12:25:14,820][00501] Updated weights for policy 0, policy_version 3590 (0.0025) [2024-03-29 12:25:16,685][00126] Fps is (10 sec: 37683.1, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 58867712. Throughput: 0: 42060.4. Samples: 59188200. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 12:25:16,686][00126] Avg episode reward: [(0, '0.054')] [2024-03-29 12:25:18,635][00481] Signal inference workers to stop experience collection... (2250 times) [2024-03-29 12:25:18,669][00501] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-03-29 12:25:18,850][00481] Signal inference workers to resume experience collection... (2250 times) [2024-03-29 12:25:18,851][00501] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-03-29 12:25:19,107][00501] Updated weights for policy 0, policy_version 3600 (0.0023) [2024-03-29 12:25:21,685][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 59113472. Throughput: 0: 42396.0. Samples: 59436260. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 12:25:21,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:25:22,210][00501] Updated weights for policy 0, policy_version 3610 (0.0026) [2024-03-29 12:25:26,685][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 59293696. Throughput: 0: 41740.4. Samples: 59547980. Policy #0 lag: (min: 0.0, avg: 23.5, max: 40.0) [2024-03-29 12:25:26,687][00126] Avg episode reward: [(0, '0.059')] [2024-03-29 12:25:26,716][00501] Updated weights for policy 0, policy_version 3620 (0.0018) [2024-03-29 12:25:30,354][00501] Updated weights for policy 0, policy_version 3630 (0.0021) [2024-03-29 12:25:31,686][00126] Fps is (10 sec: 39320.7, 60 sec: 42325.2, 300 sec: 41820.9). Total num frames: 59506688. Throughput: 0: 41455.9. Samples: 59806640. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 12:25:31,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:25:34,849][00501] Updated weights for policy 0, policy_version 3640 (0.0021) [2024-03-29 12:25:36,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 59736064. Throughput: 0: 42204.0. Samples: 60073780. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 12:25:36,688][00126] Avg episode reward: [(0, '0.046')] [2024-03-29 12:25:37,779][00501] Updated weights for policy 0, policy_version 3650 (0.0027) [2024-03-29 12:25:41,685][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 59932672. Throughput: 0: 41381.0. Samples: 60166160. Policy #0 lag: (min: 2.0, avg: 22.7, max: 43.0) [2024-03-29 12:25:41,686][00126] Avg episode reward: [(0, '0.043')] [2024-03-29 12:25:42,346][00501] Updated weights for policy 0, policy_version 3660 (0.0027) [2024-03-29 12:25:46,055][00501] Updated weights for policy 0, policy_version 3670 (0.0028) [2024-03-29 12:25:46,685][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 60145664. Throughput: 0: 41637.3. Samples: 60440500. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 12:25:46,686][00126] Avg episode reward: [(0, '0.041')] [2024-03-29 12:25:47,956][00481] Signal inference workers to stop experience collection... (2300 times) [2024-03-29 12:25:47,993][00501] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-03-29 12:25:48,143][00481] Signal inference workers to resume experience collection... (2300 times) [2024-03-29 12:25:48,144][00501] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-03-29 12:25:50,637][00501] Updated weights for policy 0, policy_version 3680 (0.0020) [2024-03-29 12:25:51,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 60342272. Throughput: 0: 42132.9. Samples: 60694280. Policy #0 lag: (min: 0.0, avg: 20.2, max: 44.0) [2024-03-29 12:25:51,686][00126] Avg episode reward: [(0, '0.059')] [2024-03-29 12:25:53,667][00501] Updated weights for policy 0, policy_version 3690 (0.0021) [2024-03-29 12:25:56,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 60555264. Throughput: 0: 41306.3. Samples: 60793240. Policy #0 lag: (min: 0.0, avg: 24.2, max: 43.0) [2024-03-29 12:25:56,686][00126] Avg episode reward: [(0, '0.068')] [2024-03-29 12:25:58,107][00501] Updated weights for policy 0, policy_version 3700 (0.0018) [2024-03-29 12:26:01,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 60768256. Throughput: 0: 41705.3. Samples: 61064940. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 12:26:01,686][00126] Avg episode reward: [(0, '0.061')] [2024-03-29 12:26:01,963][00501] Updated weights for policy 0, policy_version 3710 (0.0027) [2024-03-29 12:26:06,162][00501] Updated weights for policy 0, policy_version 3720 (0.0018) [2024-03-29 12:26:06,685][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 60964864. Throughput: 0: 41857.6. Samples: 61319860. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 12:26:06,686][00126] Avg episode reward: [(0, '0.034')] [2024-03-29 12:26:09,350][00501] Updated weights for policy 0, policy_version 3730 (0.0023) [2024-03-29 12:26:11,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 61194240. Throughput: 0: 41802.2. Samples: 61429080. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 12:26:11,686][00126] Avg episode reward: [(0, '0.050')] [2024-03-29 12:26:13,889][00501] Updated weights for policy 0, policy_version 3740 (0.0019) [2024-03-29 12:26:16,685][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41821.2). Total num frames: 61390848. Throughput: 0: 42040.6. Samples: 61698460. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 12:26:16,686][00126] Avg episode reward: [(0, '0.063')] [2024-03-29 12:26:17,519][00501] Updated weights for policy 0, policy_version 3750 (0.0023) [2024-03-29 12:26:21,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41232.9, 300 sec: 41598.7). Total num frames: 61587456. Throughput: 0: 41947.9. Samples: 61961440. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 12:26:21,686][00126] Avg episode reward: [(0, '0.054')] [2024-03-29 12:26:21,835][00501] Updated weights for policy 0, policy_version 3760 (0.0022) [2024-03-29 12:26:21,864][00481] Signal inference workers to stop experience collection... (2350 times) [2024-03-29 12:26:21,865][00481] Signal inference workers to resume experience collection... (2350 times) [2024-03-29 12:26:21,904][00501] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-03-29 12:26:21,904][00501] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-03-29 12:26:24,724][00501] Updated weights for policy 0, policy_version 3770 (0.0030) [2024-03-29 12:26:26,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 61849600. Throughput: 0: 42335.6. Samples: 62071260. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 12:26:26,686][00126] Avg episode reward: [(0, '0.072')] [2024-03-29 12:26:26,687][00481] Saving new best policy, reward=0.072! [2024-03-29 12:26:29,456][00501] Updated weights for policy 0, policy_version 3780 (0.0027) [2024-03-29 12:26:31,685][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 62013440. Throughput: 0: 41860.9. Samples: 62324240. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 12:26:31,686][00126] Avg episode reward: [(0, '0.046')] [2024-03-29 12:26:33,104][00501] Updated weights for policy 0, policy_version 3790 (0.0033) [2024-03-29 12:26:36,685][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 62226432. Throughput: 0: 42251.6. Samples: 62595600. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 12:26:36,686][00126] Avg episode reward: [(0, '0.057')] [2024-03-29 12:26:37,343][00501] Updated weights for policy 0, policy_version 3800 (0.0021) [2024-03-29 12:26:40,229][00501] Updated weights for policy 0, policy_version 3810 (0.0018) [2024-03-29 12:26:41,685][00126] Fps is (10 sec: 49152.4, 60 sec: 42871.5, 300 sec: 41876.4). Total num frames: 62504960. Throughput: 0: 42612.4. Samples: 62710800. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:26:41,686][00126] Avg episode reward: [(0, '0.081')] [2024-03-29 12:26:41,704][00481] Saving new best policy, reward=0.081! [2024-03-29 12:26:45,007][00501] Updated weights for policy 0, policy_version 3820 (0.0025) [2024-03-29 12:26:46,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 62636032. Throughput: 0: 41997.4. Samples: 62954820. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 12:26:46,686][00126] Avg episode reward: [(0, '0.061')] [2024-03-29 12:26:48,790][00501] Updated weights for policy 0, policy_version 3830 (0.0023) [2024-03-29 12:26:48,805][00481] Signal inference workers to stop experience collection... (2400 times) [2024-03-29 12:26:48,806][00481] Signal inference workers to resume experience collection... (2400 times) [2024-03-29 12:26:48,839][00501] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-03-29 12:26:48,839][00501] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-03-29 12:26:51,685][00126] Fps is (10 sec: 34406.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 62849024. Throughput: 0: 42195.6. Samples: 63218660. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 12:26:51,686][00126] Avg episode reward: [(0, '0.087')] [2024-03-29 12:26:51,958][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003837_62865408.pth... [2024-03-29 12:26:52,327][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003229_52903936.pth [2024-03-29 12:26:52,347][00481] Saving new best policy, reward=0.087! [2024-03-29 12:26:53,587][00501] Updated weights for policy 0, policy_version 3840 (0.0022) [2024-03-29 12:26:56,553][00501] Updated weights for policy 0, policy_version 3850 (0.0024) [2024-03-29 12:26:56,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 63078400. Throughput: 0: 42081.4. Samples: 63322740. Policy #0 lag: (min: 2.0, avg: 22.7, max: 43.0) [2024-03-29 12:26:56,686][00126] Avg episode reward: [(0, '0.055')] [2024-03-29 12:27:01,155][00501] Updated weights for policy 0, policy_version 3860 (0.0026) [2024-03-29 12:27:01,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 63275008. Throughput: 0: 41350.6. Samples: 63559240. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:27:01,686][00126] Avg episode reward: [(0, '0.077')] [2024-03-29 12:27:04,859][00501] Updated weights for policy 0, policy_version 3870 (0.0020) [2024-03-29 12:27:06,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 63455232. Throughput: 0: 41371.7. Samples: 63823160. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:27:06,686][00126] Avg episode reward: [(0, '0.079')] [2024-03-29 12:27:09,008][00501] Updated weights for policy 0, policy_version 3880 (0.0022) [2024-03-29 12:27:11,685][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 63700992. Throughput: 0: 41789.0. Samples: 63951760. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 12:27:11,688][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:27:12,230][00501] Updated weights for policy 0, policy_version 3890 (0.0023) [2024-03-29 12:27:16,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 63881216. Throughput: 0: 41204.1. Samples: 64178420. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:27:16,686][00126] Avg episode reward: [(0, '0.063')] [2024-03-29 12:27:16,908][00501] Updated weights for policy 0, policy_version 3900 (0.0039) [2024-03-29 12:27:20,840][00501] Updated weights for policy 0, policy_version 3910 (0.0025) [2024-03-29 12:27:21,686][00126] Fps is (10 sec: 39320.6, 60 sec: 41779.2, 300 sec: 41709.7). Total num frames: 64094208. Throughput: 0: 41118.9. Samples: 64445960. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 12:27:21,686][00126] Avg episode reward: [(0, '0.051')] [2024-03-29 12:27:22,181][00481] Signal inference workers to stop experience collection... (2450 times) [2024-03-29 12:27:22,255][00481] Signal inference workers to resume experience collection... (2450 times) [2024-03-29 12:27:22,255][00501] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-03-29 12:27:22,283][00501] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-03-29 12:27:24,765][00501] Updated weights for policy 0, policy_version 3920 (0.0022) [2024-03-29 12:27:26,685][00126] Fps is (10 sec: 42598.2, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 64307200. Throughput: 0: 41382.2. Samples: 64573000. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 12:27:26,686][00126] Avg episode reward: [(0, '0.063')] [2024-03-29 12:27:27,904][00501] Updated weights for policy 0, policy_version 3930 (0.0029) [2024-03-29 12:27:31,686][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 64520192. Throughput: 0: 40973.2. Samples: 64798620. Policy #0 lag: (min: 0.0, avg: 23.9, max: 40.0) [2024-03-29 12:27:31,686][00126] Avg episode reward: [(0, '0.060')] [2024-03-29 12:27:32,595][00501] Updated weights for policy 0, policy_version 3940 (0.0031) [2024-03-29 12:27:36,568][00501] Updated weights for policy 0, policy_version 3950 (0.0020) [2024-03-29 12:27:36,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 64716800. Throughput: 0: 41137.4. Samples: 65069840. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 12:27:36,686][00126] Avg episode reward: [(0, '0.061')] [2024-03-29 12:27:40,583][00501] Updated weights for policy 0, policy_version 3960 (0.0027) [2024-03-29 12:27:41,685][00126] Fps is (10 sec: 40960.5, 60 sec: 40413.9, 300 sec: 41543.2). Total num frames: 64929792. Throughput: 0: 41791.1. Samples: 65203340. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 12:27:41,686][00126] Avg episode reward: [(0, '0.079')] [2024-03-29 12:27:43,552][00501] Updated weights for policy 0, policy_version 3970 (0.0017) [2024-03-29 12:27:46,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 65142784. Throughput: 0: 41410.7. Samples: 65422720. Policy #0 lag: (min: 1.0, avg: 24.5, max: 42.0) [2024-03-29 12:27:46,688][00126] Avg episode reward: [(0, '0.069')] [2024-03-29 12:27:48,010][00501] Updated weights for policy 0, policy_version 3980 (0.0019) [2024-03-29 12:27:51,686][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 65339392. Throughput: 0: 41749.6. Samples: 65701900. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 12:27:51,686][00126] Avg episode reward: [(0, '0.081')] [2024-03-29 12:27:52,106][00501] Updated weights for policy 0, policy_version 3990 (0.0026) [2024-03-29 12:27:54,908][00481] Signal inference workers to stop experience collection... (2500 times) [2024-03-29 12:27:54,952][00501] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-03-29 12:27:54,987][00481] Signal inference workers to resume experience collection... (2500 times) [2024-03-29 12:27:54,991][00501] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-03-29 12:27:55,909][00501] Updated weights for policy 0, policy_version 4000 (0.0022) [2024-03-29 12:27:56,685][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 65568768. Throughput: 0: 41967.8. Samples: 65840320. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 12:27:56,686][00126] Avg episode reward: [(0, '0.067')] [2024-03-29 12:27:59,065][00501] Updated weights for policy 0, policy_version 4010 (0.0034) [2024-03-29 12:28:01,685][00126] Fps is (10 sec: 45876.0, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 65798144. Throughput: 0: 41782.2. Samples: 66058620. Policy #0 lag: (min: 2.0, avg: 23.1, max: 45.0) [2024-03-29 12:28:01,686][00126] Avg episode reward: [(0, '0.077')] [2024-03-29 12:28:03,257][00501] Updated weights for policy 0, policy_version 4020 (0.0031) [2024-03-29 12:28:06,685][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 65978368. Throughput: 0: 42161.8. Samples: 66343240. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 12:28:06,686][00126] Avg episode reward: [(0, '0.079')] [2024-03-29 12:28:07,454][00501] Updated weights for policy 0, policy_version 4030 (0.0021) [2024-03-29 12:28:11,388][00501] Updated weights for policy 0, policy_version 4040 (0.0023) [2024-03-29 12:28:11,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 66191360. Throughput: 0: 42209.3. Samples: 66472420. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 12:28:11,686][00126] Avg episode reward: [(0, '0.062')] [2024-03-29 12:28:14,276][00501] Updated weights for policy 0, policy_version 4050 (0.0020) [2024-03-29 12:28:16,685][00126] Fps is (10 sec: 47513.9, 60 sec: 42871.4, 300 sec: 41876.4). Total num frames: 66453504. Throughput: 0: 42220.5. Samples: 66698540. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 12:28:16,686][00126] Avg episode reward: [(0, '0.064')] [2024-03-29 12:28:18,561][00501] Updated weights for policy 0, policy_version 4060 (0.0024) [2024-03-29 12:28:21,685][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 66617344. Throughput: 0: 42449.3. Samples: 66980060. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:28:21,686][00126] Avg episode reward: [(0, '0.114')] [2024-03-29 12:28:21,706][00481] Saving new best policy, reward=0.114! [2024-03-29 12:28:23,161][00501] Updated weights for policy 0, policy_version 4070 (0.0022) [2024-03-29 12:28:26,562][00481] Signal inference workers to stop experience collection... (2550 times) [2024-03-29 12:28:26,587][00501] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-03-29 12:28:26,685][00126] Fps is (10 sec: 36045.2, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 66813952. Throughput: 0: 42263.6. Samples: 67105200. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 12:28:26,686][00126] Avg episode reward: [(0, '0.075')] [2024-03-29 12:28:26,772][00481] Signal inference workers to resume experience collection... (2550 times) [2024-03-29 12:28:26,772][00501] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-03-29 12:28:27,070][00501] Updated weights for policy 0, policy_version 4080 (0.0022) [2024-03-29 12:28:30,139][00501] Updated weights for policy 0, policy_version 4090 (0.0021) [2024-03-29 12:28:31,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 67076096. Throughput: 0: 42184.8. Samples: 67321040. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 12:28:31,686][00126] Avg episode reward: [(0, '0.064')] [2024-03-29 12:28:34,514][00501] Updated weights for policy 0, policy_version 4100 (0.0024) [2024-03-29 12:28:36,685][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 67256320. Throughput: 0: 41929.9. Samples: 67588740. Policy #0 lag: (min: 0.0, avg: 24.0, max: 40.0) [2024-03-29 12:28:36,686][00126] Avg episode reward: [(0, '0.117')] [2024-03-29 12:28:36,687][00481] Saving new best policy, reward=0.117! [2024-03-29 12:28:38,750][00501] Updated weights for policy 0, policy_version 4110 (0.0027) [2024-03-29 12:28:41,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 67436544. Throughput: 0: 41861.4. Samples: 67724080. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:28:41,686][00126] Avg episode reward: [(0, '0.082')] [2024-03-29 12:28:42,927][00501] Updated weights for policy 0, policy_version 4120 (0.0023) [2024-03-29 12:28:45,797][00501] Updated weights for policy 0, policy_version 4130 (0.0023) [2024-03-29 12:28:46,685][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.5, 300 sec: 41931.9). Total num frames: 67715072. Throughput: 0: 42295.5. Samples: 67961920. Policy #0 lag: (min: 2.0, avg: 24.8, max: 44.0) [2024-03-29 12:28:46,686][00126] Avg episode reward: [(0, '0.090')] [2024-03-29 12:28:49,927][00501] Updated weights for policy 0, policy_version 4140 (0.0032) [2024-03-29 12:28:51,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 67878912. Throughput: 0: 41724.5. Samples: 68220840. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:28:51,686][00126] Avg episode reward: [(0, '0.103')] [2024-03-29 12:28:51,837][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004144_67895296.pth... [2024-03-29 12:28:52,213][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003533_57884672.pth [2024-03-29 12:28:54,356][00501] Updated weights for policy 0, policy_version 4150 (0.0023) [2024-03-29 12:28:56,685][00126] Fps is (10 sec: 36044.9, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 68075520. Throughput: 0: 41854.3. Samples: 68355860. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 12:28:56,687][00126] Avg episode reward: [(0, '0.102')] [2024-03-29 12:28:58,321][00501] Updated weights for policy 0, policy_version 4160 (0.0019) [2024-03-29 12:29:00,365][00481] Signal inference workers to stop experience collection... (2600 times) [2024-03-29 12:29:00,417][00501] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-03-29 12:29:00,451][00481] Signal inference workers to resume experience collection... (2600 times) [2024-03-29 12:29:00,456][00501] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-03-29 12:29:01,556][00501] Updated weights for policy 0, policy_version 4170 (0.0036) [2024-03-29 12:29:01,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 68321280. Throughput: 0: 42203.1. Samples: 68597680. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 12:29:01,686][00126] Avg episode reward: [(0, '0.117')] [2024-03-29 12:29:05,933][00501] Updated weights for policy 0, policy_version 4180 (0.0025) [2024-03-29 12:29:06,685][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 68501504. Throughput: 0: 41506.1. Samples: 68847840. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:29:06,686][00126] Avg episode reward: [(0, '0.073')] [2024-03-29 12:29:09,990][00501] Updated weights for policy 0, policy_version 4190 (0.0021) [2024-03-29 12:29:11,685][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 68698112. Throughput: 0: 41608.8. Samples: 68977600. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 12:29:11,686][00126] Avg episode reward: [(0, '0.084')] [2024-03-29 12:29:14,185][00501] Updated weights for policy 0, policy_version 4200 (0.0027) [2024-03-29 12:29:16,685][00126] Fps is (10 sec: 45875.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 68960256. Throughput: 0: 42438.8. Samples: 69230780. Policy #0 lag: (min: 2.0, avg: 20.9, max: 41.0) [2024-03-29 12:29:16,686][00126] Avg episode reward: [(0, '0.118')] [2024-03-29 12:29:17,013][00501] Updated weights for policy 0, policy_version 4210 (0.0030) [2024-03-29 12:29:21,452][00501] Updated weights for policy 0, policy_version 4220 (0.0029) [2024-03-29 12:29:21,686][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 69140480. Throughput: 0: 41949.7. Samples: 69476480. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:29:21,686][00126] Avg episode reward: [(0, '0.101')] [2024-03-29 12:29:25,515][00501] Updated weights for policy 0, policy_version 4230 (0.0022) [2024-03-29 12:29:26,686][00126] Fps is (10 sec: 37682.6, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 69337088. Throughput: 0: 42042.1. Samples: 69615980. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 12:29:26,686][00126] Avg episode reward: [(0, '0.116')] [2024-03-29 12:29:29,684][00501] Updated weights for policy 0, policy_version 4240 (0.0027) [2024-03-29 12:29:31,503][00481] Signal inference workers to stop experience collection... (2650 times) [2024-03-29 12:29:31,534][00501] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-03-29 12:29:31,685][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 69566464. Throughput: 0: 42355.6. Samples: 69867920. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 12:29:31,686][00126] Avg episode reward: [(0, '0.100')] [2024-03-29 12:29:31,715][00481] Signal inference workers to resume experience collection... (2650 times) [2024-03-29 12:29:31,716][00501] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-03-29 12:29:32,575][00501] Updated weights for policy 0, policy_version 4250 (0.0021) [2024-03-29 12:29:36,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 69779456. Throughput: 0: 41916.5. Samples: 70107080. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 12:29:36,686][00126] Avg episode reward: [(0, '0.112')] [2024-03-29 12:29:36,928][00501] Updated weights for policy 0, policy_version 4260 (0.0023) [2024-03-29 12:29:41,023][00501] Updated weights for policy 0, policy_version 4270 (0.0017) [2024-03-29 12:29:41,685][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 69976064. Throughput: 0: 41864.9. Samples: 70239780. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 12:29:41,686][00126] Avg episode reward: [(0, '0.096')] [2024-03-29 12:29:45,262][00501] Updated weights for policy 0, policy_version 4280 (0.0020) [2024-03-29 12:29:46,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 70205440. Throughput: 0: 42489.8. Samples: 70509720. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 12:29:46,686][00126] Avg episode reward: [(0, '0.103')] [2024-03-29 12:29:48,073][00501] Updated weights for policy 0, policy_version 4290 (0.0027) [2024-03-29 12:29:51,685][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 70434816. Throughput: 0: 42155.2. Samples: 70744820. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 12:29:51,686][00126] Avg episode reward: [(0, '0.098')] [2024-03-29 12:29:52,214][00501] Updated weights for policy 0, policy_version 4300 (0.0026) [2024-03-29 12:29:56,386][00501] Updated weights for policy 0, policy_version 4310 (0.0028) [2024-03-29 12:29:56,685][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 70615040. Throughput: 0: 42224.9. Samples: 70877720. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 12:29:56,686][00126] Avg episode reward: [(0, '0.083')] [2024-03-29 12:30:00,657][00501] Updated weights for policy 0, policy_version 4320 (0.0021) [2024-03-29 12:30:01,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 70828032. Throughput: 0: 42448.0. Samples: 71140940. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:30:01,686][00126] Avg episode reward: [(0, '0.099')] [2024-03-29 12:30:03,506][00501] Updated weights for policy 0, policy_version 4330 (0.0023) [2024-03-29 12:30:03,995][00481] Signal inference workers to stop experience collection... (2700 times) [2024-03-29 12:30:04,058][00501] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-03-29 12:30:04,070][00481] Signal inference workers to resume experience collection... (2700 times) [2024-03-29 12:30:04,089][00501] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-03-29 12:30:06,685][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 71057408. Throughput: 0: 42116.9. Samples: 71371740. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 12:30:06,686][00126] Avg episode reward: [(0, '0.106')] [2024-03-29 12:30:08,037][00501] Updated weights for policy 0, policy_version 4340 (0.0018) [2024-03-29 12:30:11,686][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 71237632. Throughput: 0: 41871.1. Samples: 71500180. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 12:30:11,686][00126] Avg episode reward: [(0, '0.103')] [2024-03-29 12:30:12,203][00501] Updated weights for policy 0, policy_version 4350 (0.0024) [2024-03-29 12:30:16,252][00501] Updated weights for policy 0, policy_version 4360 (0.0026) [2024-03-29 12:30:16,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 71450624. Throughput: 0: 42428.4. Samples: 71777200. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:30:16,686][00126] Avg episode reward: [(0, '0.140')] [2024-03-29 12:30:16,845][00481] Saving new best policy, reward=0.140! [2024-03-29 12:30:19,307][00501] Updated weights for policy 0, policy_version 4370 (0.0029) [2024-03-29 12:30:21,685][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 71696384. Throughput: 0: 41768.0. Samples: 71986640. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 12:30:21,686][00126] Avg episode reward: [(0, '0.136')] [2024-03-29 12:30:23,832][00501] Updated weights for policy 0, policy_version 4380 (0.0018) [2024-03-29 12:30:26,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 71876608. Throughput: 0: 42157.3. Samples: 72136860. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 12:30:26,688][00126] Avg episode reward: [(0, '0.155')] [2024-03-29 12:30:26,689][00481] Saving new best policy, reward=0.155! [2024-03-29 12:30:28,000][00501] Updated weights for policy 0, policy_version 4390 (0.0022) [2024-03-29 12:30:31,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 72056832. Throughput: 0: 41824.0. Samples: 72391800. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 12:30:31,686][00126] Avg episode reward: [(0, '0.121')] [2024-03-29 12:30:32,223][00501] Updated weights for policy 0, policy_version 4400 (0.0028) [2024-03-29 12:30:35,013][00501] Updated weights for policy 0, policy_version 4410 (0.0028) [2024-03-29 12:30:36,685][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 72335360. Throughput: 0: 41436.0. Samples: 72609440. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 12:30:36,686][00126] Avg episode reward: [(0, '0.129')] [2024-03-29 12:30:39,534][00501] Updated weights for policy 0, policy_version 4420 (0.0017) [2024-03-29 12:30:39,933][00481] Signal inference workers to stop experience collection... (2750 times) [2024-03-29 12:30:39,983][00501] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-03-29 12:30:40,019][00481] Signal inference workers to resume experience collection... (2750 times) [2024-03-29 12:30:40,023][00501] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-03-29 12:30:41,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 72515584. Throughput: 0: 41809.8. Samples: 72759160. Policy #0 lag: (min: 0.0, avg: 23.6, max: 41.0) [2024-03-29 12:30:41,686][00126] Avg episode reward: [(0, '0.139')] [2024-03-29 12:30:43,640][00501] Updated weights for policy 0, policy_version 4430 (0.0025) [2024-03-29 12:30:46,685][00126] Fps is (10 sec: 34406.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 72679424. Throughput: 0: 41830.7. Samples: 73023320. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 12:30:46,686][00126] Avg episode reward: [(0, '0.118')] [2024-03-29 12:30:47,792][00501] Updated weights for policy 0, policy_version 4440 (0.0023) [2024-03-29 12:30:50,718][00501] Updated weights for policy 0, policy_version 4450 (0.0020) [2024-03-29 12:30:51,685][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 72957952. Throughput: 0: 41644.9. Samples: 73245760. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:30:51,686][00126] Avg episode reward: [(0, '0.157')] [2024-03-29 12:30:51,837][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004454_72974336.pth... [2024-03-29 12:30:52,186][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003837_62865408.pth [2024-03-29 12:30:52,207][00481] Saving new best policy, reward=0.157! [2024-03-29 12:30:55,096][00501] Updated weights for policy 0, policy_version 4460 (0.0018) [2024-03-29 12:30:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 73138176. Throughput: 0: 41656.5. Samples: 73374720. Policy #0 lag: (min: 1.0, avg: 24.5, max: 41.0) [2024-03-29 12:30:56,686][00126] Avg episode reward: [(0, '0.148')] [2024-03-29 12:30:59,235][00501] Updated weights for policy 0, policy_version 4470 (0.0019) [2024-03-29 12:31:01,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 73318400. Throughput: 0: 41408.4. Samples: 73640580. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:31:01,688][00126] Avg episode reward: [(0, '0.142')] [2024-03-29 12:31:03,501][00501] Updated weights for policy 0, policy_version 4480 (0.0019) [2024-03-29 12:31:06,506][00501] Updated weights for policy 0, policy_version 4490 (0.0029) [2024-03-29 12:31:06,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 73564160. Throughput: 0: 41938.6. Samples: 73873880. Policy #0 lag: (min: 1.0, avg: 17.5, max: 41.0) [2024-03-29 12:31:06,686][00126] Avg episode reward: [(0, '0.101')] [2024-03-29 12:31:10,895][00501] Updated weights for policy 0, policy_version 4500 (0.0023) [2024-03-29 12:31:11,199][00481] Signal inference workers to stop experience collection... (2800 times) [2024-03-29 12:31:11,199][00481] Signal inference workers to resume experience collection... (2800 times) [2024-03-29 12:31:11,240][00501] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-03-29 12:31:11,240][00501] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-03-29 12:31:11,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 73760768. Throughput: 0: 41323.1. Samples: 73996400. Policy #0 lag: (min: 1.0, avg: 24.2, max: 42.0) [2024-03-29 12:31:11,686][00126] Avg episode reward: [(0, '0.121')] [2024-03-29 12:31:14,778][00501] Updated weights for policy 0, policy_version 4510 (0.0025) [2024-03-29 12:31:16,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 73940992. Throughput: 0: 41974.3. Samples: 74280640. Policy #0 lag: (min: 0.0, avg: 21.0, max: 40.0) [2024-03-29 12:31:16,686][00126] Avg episode reward: [(0, '0.138')] [2024-03-29 12:31:18,947][00501] Updated weights for policy 0, policy_version 4520 (0.0019) [2024-03-29 12:31:21,685][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 74186752. Throughput: 0: 42330.8. Samples: 74514320. Policy #0 lag: (min: 0.0, avg: 18.0, max: 41.0) [2024-03-29 12:31:21,686][00126] Avg episode reward: [(0, '0.142')] [2024-03-29 12:31:22,056][00501] Updated weights for policy 0, policy_version 4530 (0.0030) [2024-03-29 12:31:26,544][00501] Updated weights for policy 0, policy_version 4540 (0.0024) [2024-03-29 12:31:26,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 74383360. Throughput: 0: 41491.6. Samples: 74626280. Policy #0 lag: (min: 2.0, avg: 23.4, max: 41.0) [2024-03-29 12:31:26,686][00126] Avg episode reward: [(0, '0.101')] [2024-03-29 12:31:30,521][00501] Updated weights for policy 0, policy_version 4550 (0.0017) [2024-03-29 12:31:31,685][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 74596352. Throughput: 0: 41833.8. Samples: 74905840. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 12:31:31,686][00126] Avg episode reward: [(0, '0.145')] [2024-03-29 12:31:34,847][00501] Updated weights for policy 0, policy_version 4560 (0.0028) [2024-03-29 12:31:36,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 74809344. Throughput: 0: 42046.3. Samples: 75137840. Policy #0 lag: (min: 1.0, avg: 17.7, max: 41.0) [2024-03-29 12:31:36,686][00126] Avg episode reward: [(0, '0.206')] [2024-03-29 12:31:36,779][00481] Saving new best policy, reward=0.206! [2024-03-29 12:31:37,985][00501] Updated weights for policy 0, policy_version 4570 (0.0026) [2024-03-29 12:31:41,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 75022336. Throughput: 0: 41434.3. Samples: 75239260. Policy #0 lag: (min: 2.0, avg: 22.0, max: 41.0) [2024-03-29 12:31:41,686][00126] Avg episode reward: [(0, '0.129')] [2024-03-29 12:31:42,237][00501] Updated weights for policy 0, policy_version 4580 (0.0024) [2024-03-29 12:31:46,246][00501] Updated weights for policy 0, policy_version 4590 (0.0027) [2024-03-29 12:31:46,685][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 75202560. Throughput: 0: 41781.0. Samples: 75520720. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:31:46,686][00126] Avg episode reward: [(0, '0.161')] [2024-03-29 12:31:50,021][00481] Signal inference workers to stop experience collection... (2850 times) [2024-03-29 12:31:50,063][00501] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-03-29 12:31:50,101][00481] Signal inference workers to resume experience collection... (2850 times) [2024-03-29 12:31:50,109][00501] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-03-29 12:31:50,608][00501] Updated weights for policy 0, policy_version 4600 (0.0024) [2024-03-29 12:31:51,685][00126] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 41820.8). Total num frames: 75415552. Throughput: 0: 42035.6. Samples: 75765480. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 12:31:51,686][00126] Avg episode reward: [(0, '0.111')] [2024-03-29 12:31:53,730][00501] Updated weights for policy 0, policy_version 4610 (0.0025) [2024-03-29 12:31:56,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 75661312. Throughput: 0: 41610.3. Samples: 75868860. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:31:56,686][00126] Avg episode reward: [(0, '0.149')] [2024-03-29 12:31:57,856][00501] Updated weights for policy 0, policy_version 4620 (0.0020) [2024-03-29 12:32:01,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 75841536. Throughput: 0: 41107.9. Samples: 76130500. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 12:32:01,687][00126] Avg episode reward: [(0, '0.124')] [2024-03-29 12:32:02,005][00501] Updated weights for policy 0, policy_version 4630 (0.0020) [2024-03-29 12:32:06,256][00501] Updated weights for policy 0, policy_version 4640 (0.0024) [2024-03-29 12:32:06,685][00126] Fps is (10 sec: 37683.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 76038144. Throughput: 0: 41933.3. Samples: 76401320. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 12:32:06,686][00126] Avg episode reward: [(0, '0.136')] [2024-03-29 12:32:09,228][00501] Updated weights for policy 0, policy_version 4650 (0.0026) [2024-03-29 12:32:11,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 76300288. Throughput: 0: 41958.5. Samples: 76514420. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:32:11,686][00126] Avg episode reward: [(0, '0.133')] [2024-03-29 12:32:13,359][00501] Updated weights for policy 0, policy_version 4660 (0.0022) [2024-03-29 12:32:16,685][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 76480512. Throughput: 0: 41485.8. Samples: 76772700. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 12:32:16,687][00126] Avg episode reward: [(0, '0.142')] [2024-03-29 12:32:17,519][00501] Updated weights for policy 0, policy_version 4670 (0.0019) [2024-03-29 12:32:21,685][00126] Fps is (10 sec: 36045.4, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 76660736. Throughput: 0: 42455.1. Samples: 77048320. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 12:32:21,686][00126] Avg episode reward: [(0, '0.167')] [2024-03-29 12:32:21,693][00501] Updated weights for policy 0, policy_version 4680 (0.0020) [2024-03-29 12:32:24,767][00501] Updated weights for policy 0, policy_version 4690 (0.0024) [2024-03-29 12:32:25,577][00481] Signal inference workers to stop experience collection... (2900 times) [2024-03-29 12:32:25,657][00481] Signal inference workers to resume experience collection... (2900 times) [2024-03-29 12:32:25,661][00501] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-03-29 12:32:25,687][00501] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-03-29 12:32:26,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 76922880. Throughput: 0: 42400.5. Samples: 77147280. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 12:32:26,686][00126] Avg episode reward: [(0, '0.159')] [2024-03-29 12:32:29,004][00501] Updated weights for policy 0, policy_version 4700 (0.0023) [2024-03-29 12:32:31,685][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 77103104. Throughput: 0: 41708.9. Samples: 77397620. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 12:32:31,686][00126] Avg episode reward: [(0, '0.201')] [2024-03-29 12:32:33,229][00501] Updated weights for policy 0, policy_version 4710 (0.0019) [2024-03-29 12:32:36,685][00126] Fps is (10 sec: 36044.4, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 77283328. Throughput: 0: 42399.9. Samples: 77673480. Policy #0 lag: (min: 0.0, avg: 18.8, max: 41.0) [2024-03-29 12:32:36,686][00126] Avg episode reward: [(0, '0.196')] [2024-03-29 12:32:37,528][00501] Updated weights for policy 0, policy_version 4720 (0.0023) [2024-03-29 12:32:40,572][00501] Updated weights for policy 0, policy_version 4730 (0.0035) [2024-03-29 12:32:41,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 77561856. Throughput: 0: 42300.8. Samples: 77772400. Policy #0 lag: (min: 1.0, avg: 21.4, max: 44.0) [2024-03-29 12:32:41,686][00126] Avg episode reward: [(0, '0.158')] [2024-03-29 12:32:44,633][00501] Updated weights for policy 0, policy_version 4740 (0.0023) [2024-03-29 12:32:46,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 77742080. Throughput: 0: 42136.5. Samples: 78026640. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:32:46,687][00126] Avg episode reward: [(0, '0.161')] [2024-03-29 12:32:48,782][00501] Updated weights for policy 0, policy_version 4750 (0.0019) [2024-03-29 12:32:51,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 77922304. Throughput: 0: 42047.5. Samples: 78293460. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:32:51,686][00126] Avg episode reward: [(0, '0.214')] [2024-03-29 12:32:51,704][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004756_77922304.pth... [2024-03-29 12:32:52,113][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004144_67895296.pth [2024-03-29 12:32:52,132][00481] Saving new best policy, reward=0.214! [2024-03-29 12:32:53,478][00501] Updated weights for policy 0, policy_version 4760 (0.0032) [2024-03-29 12:32:56,538][00501] Updated weights for policy 0, policy_version 4770 (0.0029) [2024-03-29 12:32:56,685][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 78151680. Throughput: 0: 41709.5. Samples: 78391340. Policy #0 lag: (min: 0.0, avg: 21.2, max: 48.0) [2024-03-29 12:32:56,686][00126] Avg episode reward: [(0, '0.197')] [2024-03-29 12:33:00,027][00481] Signal inference workers to stop experience collection... (2950 times) [2024-03-29 12:33:00,029][00481] Signal inference workers to resume experience collection... (2950 times) [2024-03-29 12:33:00,060][00501] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-03-29 12:33:00,060][00501] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-03-29 12:33:00,333][00501] Updated weights for policy 0, policy_version 4780 (0.0030) [2024-03-29 12:33:01,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 78364672. Throughput: 0: 41378.7. Samples: 78634740. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 12:33:01,686][00126] Avg episode reward: [(0, '0.198')] [2024-03-29 12:33:04,623][00501] Updated weights for policy 0, policy_version 4790 (0.0023) [2024-03-29 12:33:06,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 78544896. Throughput: 0: 41251.5. Samples: 78904640. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 12:33:06,686][00126] Avg episode reward: [(0, '0.205')] [2024-03-29 12:33:09,040][00501] Updated weights for policy 0, policy_version 4800 (0.0024) [2024-03-29 12:33:11,686][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 78790656. Throughput: 0: 41778.1. Samples: 79027300. Policy #0 lag: (min: 2.0, avg: 21.4, max: 41.0) [2024-03-29 12:33:11,688][00126] Avg episode reward: [(0, '0.158')] [2024-03-29 12:33:12,267][00501] Updated weights for policy 0, policy_version 4810 (0.0021) [2024-03-29 12:33:15,978][00501] Updated weights for policy 0, policy_version 4820 (0.0017) [2024-03-29 12:33:16,685][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 78987264. Throughput: 0: 41623.5. Samples: 79270680. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:33:16,686][00126] Avg episode reward: [(0, '0.183')] [2024-03-29 12:33:20,116][00501] Updated weights for policy 0, policy_version 4830 (0.0023) [2024-03-29 12:33:21,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 79167488. Throughput: 0: 41584.9. Samples: 79544800. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:33:21,686][00126] Avg episode reward: [(0, '0.191')] [2024-03-29 12:33:24,481][00501] Updated weights for policy 0, policy_version 4840 (0.0022) [2024-03-29 12:33:26,685][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 79429632. Throughput: 0: 42320.9. Samples: 79676840. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 12:33:26,687][00126] Avg episode reward: [(0, '0.202')] [2024-03-29 12:33:27,517][00501] Updated weights for policy 0, policy_version 4850 (0.0025) [2024-03-29 12:33:31,649][00501] Updated weights for policy 0, policy_version 4860 (0.0021) [2024-03-29 12:33:31,685][00126] Fps is (10 sec: 45875.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 79626240. Throughput: 0: 41736.5. Samples: 79904780. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 12:33:31,686][00126] Avg episode reward: [(0, '0.193')] [2024-03-29 12:33:35,902][00501] Updated weights for policy 0, policy_version 4870 (0.0026) [2024-03-29 12:33:36,685][00126] Fps is (10 sec: 36044.9, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 79790080. Throughput: 0: 41812.4. Samples: 80175020. Policy #0 lag: (min: 2.0, avg: 21.1, max: 41.0) [2024-03-29 12:33:36,686][00126] Avg episode reward: [(0, '0.169')] [2024-03-29 12:33:39,313][00481] Signal inference workers to stop experience collection... (3000 times) [2024-03-29 12:33:39,346][00501] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-03-29 12:33:39,494][00481] Signal inference workers to resume experience collection... (3000 times) [2024-03-29 12:33:39,495][00501] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-03-29 12:33:40,282][00501] Updated weights for policy 0, policy_version 4880 (0.0024) [2024-03-29 12:33:41,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 80035840. Throughput: 0: 42618.5. Samples: 80309180. Policy #0 lag: (min: 1.0, avg: 18.0, max: 42.0) [2024-03-29 12:33:41,686][00126] Avg episode reward: [(0, '0.214')] [2024-03-29 12:33:43,313][00501] Updated weights for policy 0, policy_version 4890 (0.0025) [2024-03-29 12:33:46,686][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 80248832. Throughput: 0: 41863.4. Samples: 80518600. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 12:33:46,690][00126] Avg episode reward: [(0, '0.148')] [2024-03-29 12:33:47,397][00501] Updated weights for policy 0, policy_version 4900 (0.0027) [2024-03-29 12:33:51,470][00501] Updated weights for policy 0, policy_version 4910 (0.0020) [2024-03-29 12:33:51,686][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 80445440. Throughput: 0: 41863.8. Samples: 80788520. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:33:51,686][00126] Avg episode reward: [(0, '0.172')] [2024-03-29 12:33:55,967][00501] Updated weights for policy 0, policy_version 4920 (0.0028) [2024-03-29 12:33:56,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 80642048. Throughput: 0: 42214.8. Samples: 80926960. Policy #0 lag: (min: 2.0, avg: 18.3, max: 41.0) [2024-03-29 12:33:56,686][00126] Avg episode reward: [(0, '0.188')] [2024-03-29 12:33:59,071][00501] Updated weights for policy 0, policy_version 4930 (0.0025) [2024-03-29 12:34:01,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 80887808. Throughput: 0: 41729.9. Samples: 81148520. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:34:01,686][00126] Avg episode reward: [(0, '0.151')] [2024-03-29 12:34:02,918][00501] Updated weights for policy 0, policy_version 4940 (0.0019) [2024-03-29 12:34:06,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 81084416. Throughput: 0: 41610.3. Samples: 81417260. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 12:34:06,686][00126] Avg episode reward: [(0, '0.150')] [2024-03-29 12:34:06,909][00501] Updated weights for policy 0, policy_version 4950 (0.0017) [2024-03-29 12:34:11,188][00481] Signal inference workers to stop experience collection... (3050 times) [2024-03-29 12:34:11,219][00501] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-03-29 12:34:11,387][00481] Signal inference workers to resume experience collection... (3050 times) [2024-03-29 12:34:11,387][00501] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-03-29 12:34:11,391][00501] Updated weights for policy 0, policy_version 4960 (0.0019) [2024-03-29 12:34:11,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 81281024. Throughput: 0: 41716.0. Samples: 81554060. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 12:34:11,686][00126] Avg episode reward: [(0, '0.217')] [2024-03-29 12:34:11,903][00481] Saving new best policy, reward=0.217! [2024-03-29 12:34:14,828][00501] Updated weights for policy 0, policy_version 4970 (0.0029) [2024-03-29 12:34:16,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 81510400. Throughput: 0: 41791.6. Samples: 81785400. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:34:16,688][00126] Avg episode reward: [(0, '0.179')] [2024-03-29 12:34:18,593][00501] Updated weights for policy 0, policy_version 4980 (0.0029) [2024-03-29 12:34:21,686][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 81690624. Throughput: 0: 41656.9. Samples: 82049580. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 12:34:21,686][00126] Avg episode reward: [(0, '0.207')] [2024-03-29 12:34:22,857][00501] Updated weights for policy 0, policy_version 4990 (0.0033) [2024-03-29 12:34:26,685][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.1, 300 sec: 41765.3). Total num frames: 81887232. Throughput: 0: 41712.5. Samples: 82186240. Policy #0 lag: (min: 1.0, avg: 18.6, max: 42.0) [2024-03-29 12:34:26,686][00126] Avg episode reward: [(0, '0.179')] [2024-03-29 12:34:27,020][00501] Updated weights for policy 0, policy_version 5000 (0.0019) [2024-03-29 12:34:30,239][00501] Updated weights for policy 0, policy_version 5010 (0.0018) [2024-03-29 12:34:31,686][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 82149376. Throughput: 0: 42157.3. Samples: 82415680. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 12:34:31,686][00126] Avg episode reward: [(0, '0.203')] [2024-03-29 12:34:34,137][00501] Updated weights for policy 0, policy_version 5020 (0.0019) [2024-03-29 12:34:36,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 41931.9). Total num frames: 82345984. Throughput: 0: 41725.0. Samples: 82666140. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 12:34:36,686][00126] Avg episode reward: [(0, '0.198')] [2024-03-29 12:34:38,496][00501] Updated weights for policy 0, policy_version 5030 (0.0021) [2024-03-29 12:34:41,685][00126] Fps is (10 sec: 36045.3, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 82509824. Throughput: 0: 41972.9. Samples: 82815740. Policy #0 lag: (min: 1.0, avg: 18.4, max: 42.0) [2024-03-29 12:34:41,686][00126] Avg episode reward: [(0, '0.219')] [2024-03-29 12:34:41,914][00481] Signal inference workers to stop experience collection... (3100 times) [2024-03-29 12:34:41,952][00501] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-03-29 12:34:42,096][00481] Signal inference workers to resume experience collection... (3100 times) [2024-03-29 12:34:42,096][00501] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-03-29 12:34:42,096][00481] Saving new best policy, reward=0.219! [2024-03-29 12:34:42,943][00501] Updated weights for policy 0, policy_version 5040 (0.0023) [2024-03-29 12:34:45,969][00501] Updated weights for policy 0, policy_version 5050 (0.0026) [2024-03-29 12:34:46,685][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 82771968. Throughput: 0: 42082.3. Samples: 83042220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 12:34:46,686][00126] Avg episode reward: [(0, '0.204')] [2024-03-29 12:34:49,711][00501] Updated weights for policy 0, policy_version 5060 (0.0028) [2024-03-29 12:34:51,686][00126] Fps is (10 sec: 47513.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 82984960. Throughput: 0: 41504.3. Samples: 83284960. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:34:51,686][00126] Avg episode reward: [(0, '0.187')] [2024-03-29 12:34:51,709][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005065_82984960.pth... [2024-03-29 12:34:52,045][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004454_72974336.pth [2024-03-29 12:34:54,111][00501] Updated weights for policy 0, policy_version 5070 (0.0029) [2024-03-29 12:34:56,685][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 83148800. Throughput: 0: 41734.8. Samples: 83432120. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:34:56,686][00126] Avg episode reward: [(0, '0.298')] [2024-03-29 12:34:56,848][00481] Saving new best policy, reward=0.298! [2024-03-29 12:34:58,490][00501] Updated weights for policy 0, policy_version 5080 (0.0025) [2024-03-29 12:35:01,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 83378176. Throughput: 0: 41896.0. Samples: 83670720. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 12:35:01,687][00126] Avg episode reward: [(0, '0.257')] [2024-03-29 12:35:01,696][00501] Updated weights for policy 0, policy_version 5090 (0.0017) [2024-03-29 12:35:05,372][00501] Updated weights for policy 0, policy_version 5100 (0.0022) [2024-03-29 12:35:06,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 83607552. Throughput: 0: 41479.6. Samples: 83916160. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:35:06,686][00126] Avg episode reward: [(0, '0.180')] [2024-03-29 12:35:09,631][00501] Updated weights for policy 0, policy_version 5110 (0.0023) [2024-03-29 12:35:11,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 83787776. Throughput: 0: 41662.2. Samples: 84061040. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 12:35:11,686][00126] Avg episode reward: [(0, '0.231')] [2024-03-29 12:35:13,708][00481] Signal inference workers to stop experience collection... (3150 times) [2024-03-29 12:35:13,754][00501] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-03-29 12:35:13,787][00481] Signal inference workers to resume experience collection... (3150 times) [2024-03-29 12:35:13,790][00501] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-03-29 12:35:13,797][00501] Updated weights for policy 0, policy_version 5120 (0.0024) [2024-03-29 12:35:16,686][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 84017152. Throughput: 0: 42049.3. Samples: 84307900. Policy #0 lag: (min: 1.0, avg: 17.6, max: 41.0) [2024-03-29 12:35:16,686][00126] Avg episode reward: [(0, '0.200')] [2024-03-29 12:35:17,088][00501] Updated weights for policy 0, policy_version 5130 (0.0023) [2024-03-29 12:35:20,970][00501] Updated weights for policy 0, policy_version 5140 (0.0019) [2024-03-29 12:35:21,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 84246528. Throughput: 0: 41965.7. Samples: 84554600. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 12:35:21,686][00126] Avg episode reward: [(0, '0.171')] [2024-03-29 12:35:25,324][00501] Updated weights for policy 0, policy_version 5150 (0.0026) [2024-03-29 12:35:26,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 84393984. Throughput: 0: 41622.2. Samples: 84688740. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 12:35:26,686][00126] Avg episode reward: [(0, '0.210')] [2024-03-29 12:35:29,548][00501] Updated weights for policy 0, policy_version 5160 (0.0028) [2024-03-29 12:35:31,686][00126] Fps is (10 sec: 40958.8, 60 sec: 41779.0, 300 sec: 41765.3). Total num frames: 84656128. Throughput: 0: 42036.9. Samples: 84933900. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 12:35:31,687][00126] Avg episode reward: [(0, '0.141')] [2024-03-29 12:35:32,913][00501] Updated weights for policy 0, policy_version 5170 (0.0023) [2024-03-29 12:35:36,685][00126] Fps is (10 sec: 45874.9, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 84852736. Throughput: 0: 42036.5. Samples: 85176600. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 12:35:36,686][00126] Avg episode reward: [(0, '0.232')] [2024-03-29 12:35:36,776][00501] Updated weights for policy 0, policy_version 5180 (0.0023) [2024-03-29 12:35:41,146][00501] Updated weights for policy 0, policy_version 5190 (0.0018) [2024-03-29 12:35:41,685][00126] Fps is (10 sec: 37684.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 85032960. Throughput: 0: 41842.2. Samples: 85315020. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:35:41,686][00126] Avg episode reward: [(0, '0.193')] [2024-03-29 12:35:45,314][00501] Updated weights for policy 0, policy_version 5200 (0.0037) [2024-03-29 12:35:46,137][00481] Signal inference workers to stop experience collection... (3200 times) [2024-03-29 12:35:46,180][00501] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-03-29 12:35:46,355][00481] Signal inference workers to resume experience collection... (3200 times) [2024-03-29 12:35:46,355][00501] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-03-29 12:35:46,685][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 85278720. Throughput: 0: 42254.7. Samples: 85572180. Policy #0 lag: (min: 1.0, avg: 18.6, max: 42.0) [2024-03-29 12:35:46,686][00126] Avg episode reward: [(0, '0.234')] [2024-03-29 12:35:48,595][00501] Updated weights for policy 0, policy_version 5210 (0.0031) [2024-03-29 12:35:51,685][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 85508096. Throughput: 0: 41977.8. Samples: 85805160. Policy #0 lag: (min: 1.0, avg: 23.7, max: 42.0) [2024-03-29 12:35:51,686][00126] Avg episode reward: [(0, '0.176')] [2024-03-29 12:35:52,257][00501] Updated weights for policy 0, policy_version 5220 (0.0019) [2024-03-29 12:35:56,636][00501] Updated weights for policy 0, policy_version 5230 (0.0023) [2024-03-29 12:35:56,686][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.2, 300 sec: 41931.9). Total num frames: 85688320. Throughput: 0: 41774.5. Samples: 85940900. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 12:35:56,686][00126] Avg episode reward: [(0, '0.240')] [2024-03-29 12:36:00,753][00501] Updated weights for policy 0, policy_version 5240 (0.0022) [2024-03-29 12:36:01,685][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 85901312. Throughput: 0: 42325.5. Samples: 86212540. Policy #0 lag: (min: 2.0, avg: 20.2, max: 43.0) [2024-03-29 12:36:01,686][00126] Avg episode reward: [(0, '0.258')] [2024-03-29 12:36:04,047][00501] Updated weights for policy 0, policy_version 5250 (0.0022) [2024-03-29 12:36:06,685][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 86130688. Throughput: 0: 41987.1. Samples: 86444020. Policy #0 lag: (min: 2.0, avg: 20.2, max: 43.0) [2024-03-29 12:36:06,686][00126] Avg episode reward: [(0, '0.165')] [2024-03-29 12:36:07,759][00501] Updated weights for policy 0, policy_version 5260 (0.0028) [2024-03-29 12:36:11,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 86327296. Throughput: 0: 41939.5. Samples: 86576020. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 12:36:11,686][00126] Avg episode reward: [(0, '0.275')] [2024-03-29 12:36:11,909][00501] Updated weights for policy 0, policy_version 5270 (0.0022) [2024-03-29 12:36:15,921][00501] Updated weights for policy 0, policy_version 5280 (0.0026) [2024-03-29 12:36:16,685][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 86540288. Throughput: 0: 42680.8. Samples: 86854520. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 12:36:16,686][00126] Avg episode reward: [(0, '0.213')] [2024-03-29 12:36:18,025][00481] Signal inference workers to stop experience collection... (3250 times) [2024-03-29 12:36:18,047][00501] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-03-29 12:36:18,231][00481] Signal inference workers to resume experience collection... (3250 times) [2024-03-29 12:36:18,232][00501] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-03-29 12:36:19,348][00501] Updated weights for policy 0, policy_version 5290 (0.0022) [2024-03-29 12:36:21,686][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 86769664. Throughput: 0: 42284.0. Samples: 87079380. Policy #0 lag: (min: 0.0, avg: 21.7, max: 45.0) [2024-03-29 12:36:21,687][00126] Avg episode reward: [(0, '0.194')] [2024-03-29 12:36:23,360][00501] Updated weights for policy 0, policy_version 5300 (0.0021) [2024-03-29 12:36:26,685][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 86949888. Throughput: 0: 42044.9. Samples: 87207040. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:36:26,686][00126] Avg episode reward: [(0, '0.207')] [2024-03-29 12:36:27,745][00501] Updated weights for policy 0, policy_version 5310 (0.0021) [2024-03-29 12:36:31,627][00501] Updated weights for policy 0, policy_version 5320 (0.0018) [2024-03-29 12:36:31,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.5, 300 sec: 41876.4). Total num frames: 87162880. Throughput: 0: 42515.1. Samples: 87485360. Policy #0 lag: (min: 1.0, avg: 18.0, max: 42.0) [2024-03-29 12:36:31,686][00126] Avg episode reward: [(0, '0.251')] [2024-03-29 12:36:35,122][00501] Updated weights for policy 0, policy_version 5330 (0.0026) [2024-03-29 12:36:36,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 87408640. Throughput: 0: 42107.1. Samples: 87699980. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 12:36:36,686][00126] Avg episode reward: [(0, '0.242')] [2024-03-29 12:36:38,963][00501] Updated weights for policy 0, policy_version 5340 (0.0019) [2024-03-29 12:36:41,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 87588864. Throughput: 0: 42240.1. Samples: 87841700. Policy #0 lag: (min: 3.0, avg: 23.1, max: 43.0) [2024-03-29 12:36:41,686][00126] Avg episode reward: [(0, '0.228')] [2024-03-29 12:36:43,518][00501] Updated weights for policy 0, policy_version 5350 (0.0028) [2024-03-29 12:36:46,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 87769088. Throughput: 0: 42080.9. Samples: 88106180. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 12:36:46,686][00126] Avg episode reward: [(0, '0.287')] [2024-03-29 12:36:47,541][00501] Updated weights for policy 0, policy_version 5360 (0.0027) [2024-03-29 12:36:50,859][00501] Updated weights for policy 0, policy_version 5370 (0.0029) [2024-03-29 12:36:51,686][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 88014848. Throughput: 0: 41942.6. Samples: 88331440. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 12:36:51,687][00126] Avg episode reward: [(0, '0.184')] [2024-03-29 12:36:51,758][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005373_88031232.pth... [2024-03-29 12:36:52,127][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004756_77922304.pth [2024-03-29 12:36:52,165][00481] Signal inference workers to stop experience collection... (3300 times) [2024-03-29 12:36:52,197][00501] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-03-29 12:36:52,396][00481] Signal inference workers to resume experience collection... (3300 times) [2024-03-29 12:36:52,397][00501] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-03-29 12:36:54,815][00501] Updated weights for policy 0, policy_version 5380 (0.0022) [2024-03-29 12:36:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 88227840. Throughput: 0: 41897.3. Samples: 88461400. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 12:36:56,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:36:59,033][00501] Updated weights for policy 0, policy_version 5390 (0.0022) [2024-03-29 12:37:01,685][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 88408064. Throughput: 0: 41683.5. Samples: 88730280. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 12:37:01,686][00126] Avg episode reward: [(0, '0.201')] [2024-03-29 12:37:03,173][00501] Updated weights for policy 0, policy_version 5400 (0.0028) [2024-03-29 12:37:06,549][00501] Updated weights for policy 0, policy_version 5410 (0.0024) [2024-03-29 12:37:06,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 88637440. Throughput: 0: 41976.1. Samples: 88968300. Policy #0 lag: (min: 0.0, avg: 18.1, max: 42.0) [2024-03-29 12:37:06,686][00126] Avg episode reward: [(0, '0.202')] [2024-03-29 12:37:10,530][00501] Updated weights for policy 0, policy_version 5420 (0.0026) [2024-03-29 12:37:11,686][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 88850432. Throughput: 0: 41827.8. Samples: 89089300. Policy #0 lag: (min: 2.0, avg: 23.0, max: 42.0) [2024-03-29 12:37:11,686][00126] Avg episode reward: [(0, '0.239')] [2024-03-29 12:37:14,615][00501] Updated weights for policy 0, policy_version 5430 (0.0018) [2024-03-29 12:37:16,685][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 89030656. Throughput: 0: 41778.2. Samples: 89365380. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 12:37:16,686][00126] Avg episode reward: [(0, '0.243')] [2024-03-29 12:37:18,731][00501] Updated weights for policy 0, policy_version 5440 (0.0020) [2024-03-29 12:37:21,686][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 89276416. Throughput: 0: 42238.1. Samples: 89600700. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 12:37:21,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:37:21,921][00501] Updated weights for policy 0, policy_version 5450 (0.0025) [2024-03-29 12:37:26,064][00501] Updated weights for policy 0, policy_version 5460 (0.0022) [2024-03-29 12:37:26,466][00481] Signal inference workers to stop experience collection... (3350 times) [2024-03-29 12:37:26,509][00501] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-03-29 12:37:26,685][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 89473024. Throughput: 0: 41780.1. Samples: 89721800. Policy #0 lag: (min: 0.0, avg: 23.8, max: 42.0) [2024-03-29 12:37:26,686][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 12:37:26,687][00481] Signal inference workers to resume experience collection... (3350 times) [2024-03-29 12:37:26,687][00501] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-03-29 12:37:30,341][00501] Updated weights for policy 0, policy_version 5470 (0.0025) [2024-03-29 12:37:31,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 89653248. Throughput: 0: 41778.6. Samples: 89986220. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 12:37:31,686][00126] Avg episode reward: [(0, '0.231')] [2024-03-29 12:37:34,383][00501] Updated weights for policy 0, policy_version 5480 (0.0021) [2024-03-29 12:37:36,685][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 89915392. Throughput: 0: 42072.9. Samples: 90224720. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 12:37:36,688][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 12:37:36,689][00481] Saving new best policy, reward=0.305! [2024-03-29 12:37:37,674][00501] Updated weights for policy 0, policy_version 5490 (0.0037) [2024-03-29 12:37:41,685][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 90095616. Throughput: 0: 41635.6. Samples: 90335000. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 12:37:41,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:37:41,807][00501] Updated weights for policy 0, policy_version 5500 (0.0021) [2024-03-29 12:37:45,805][00501] Updated weights for policy 0, policy_version 5510 (0.0027) [2024-03-29 12:37:46,685][00126] Fps is (10 sec: 37683.6, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 90292224. Throughput: 0: 41962.7. Samples: 90618600. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:37:46,686][00126] Avg episode reward: [(0, '0.245')] [2024-03-29 12:37:50,129][00501] Updated weights for policy 0, policy_version 5520 (0.0025) [2024-03-29 12:37:51,686][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 90521600. Throughput: 0: 41782.1. Samples: 90848500. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 12:37:51,686][00126] Avg episode reward: [(0, '0.238')] [2024-03-29 12:37:53,404][00501] Updated weights for policy 0, policy_version 5530 (0.0022) [2024-03-29 12:37:56,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 90734592. Throughput: 0: 41761.9. Samples: 90968580. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 12:37:56,686][00126] Avg episode reward: [(0, '0.193')] [2024-03-29 12:37:57,566][00501] Updated weights for policy 0, policy_version 5540 (0.0022) [2024-03-29 12:38:01,643][00501] Updated weights for policy 0, policy_version 5550 (0.0022) [2024-03-29 12:38:01,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 90931200. Throughput: 0: 41403.0. Samples: 91228520. Policy #0 lag: (min: 0.0, avg: 22.2, max: 44.0) [2024-03-29 12:38:01,686][00126] Avg episode reward: [(0, '0.189')] [2024-03-29 12:38:03,906][00481] Signal inference workers to stop experience collection... (3400 times) [2024-03-29 12:38:03,950][00501] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-03-29 12:38:04,126][00481] Signal inference workers to resume experience collection... (3400 times) [2024-03-29 12:38:04,126][00501] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-03-29 12:38:05,949][00501] Updated weights for policy 0, policy_version 5560 (0.0019) [2024-03-29 12:38:06,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 91127808. Throughput: 0: 41939.6. Samples: 91487980. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 12:38:06,686][00126] Avg episode reward: [(0, '0.236')] [2024-03-29 12:38:09,098][00501] Updated weights for policy 0, policy_version 5570 (0.0025) [2024-03-29 12:38:11,687][00126] Fps is (10 sec: 44230.1, 60 sec: 42051.2, 300 sec: 41987.2). Total num frames: 91373568. Throughput: 0: 41726.9. Samples: 91599580. Policy #0 lag: (min: 2.0, avg: 23.2, max: 41.0) [2024-03-29 12:38:11,688][00126] Avg episode reward: [(0, '0.180')] [2024-03-29 12:38:13,182][00501] Updated weights for policy 0, policy_version 5580 (0.0028) [2024-03-29 12:38:16,685][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 91553792. Throughput: 0: 41579.6. Samples: 91857300. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 12:38:16,687][00126] Avg episode reward: [(0, '0.250')] [2024-03-29 12:38:17,248][00501] Updated weights for policy 0, policy_version 5590 (0.0024) [2024-03-29 12:38:21,552][00501] Updated weights for policy 0, policy_version 5600 (0.0023) [2024-03-29 12:38:21,685][00126] Fps is (10 sec: 37689.6, 60 sec: 41233.2, 300 sec: 41765.3). Total num frames: 91750400. Throughput: 0: 42297.5. Samples: 92128100. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 12:38:21,686][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 12:38:22,053][00481] Saving new best policy, reward=0.334! [2024-03-29 12:38:24,951][00501] Updated weights for policy 0, policy_version 5610 (0.0022) [2024-03-29 12:38:26,685][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 92012544. Throughput: 0: 41871.5. Samples: 92219220. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 12:38:26,686][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 12:38:28,893][00501] Updated weights for policy 0, policy_version 5620 (0.0019) [2024-03-29 12:38:31,686][00126] Fps is (10 sec: 42597.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 92176384. Throughput: 0: 41219.9. Samples: 92473500. Policy #0 lag: (min: 0.0, avg: 23.4, max: 42.0) [2024-03-29 12:38:31,686][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 12:38:33,149][00501] Updated weights for policy 0, policy_version 5630 (0.0028) [2024-03-29 12:38:36,685][00126] Fps is (10 sec: 34407.0, 60 sec: 40687.0, 300 sec: 41765.3). Total num frames: 92356608. Throughput: 0: 42437.5. Samples: 92758180. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 12:38:36,686][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 12:38:37,273][00501] Updated weights for policy 0, policy_version 5640 (0.0026) [2024-03-29 12:38:37,303][00481] Signal inference workers to stop experience collection... (3450 times) [2024-03-29 12:38:37,339][00501] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-03-29 12:38:37,522][00481] Signal inference workers to resume experience collection... (3450 times) [2024-03-29 12:38:37,523][00501] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-03-29 12:38:40,524][00501] Updated weights for policy 0, policy_version 5650 (0.0024) [2024-03-29 12:38:41,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 92618752. Throughput: 0: 41838.7. Samples: 92851320. Policy #0 lag: (min: 2.0, avg: 22.4, max: 43.0) [2024-03-29 12:38:41,686][00126] Avg episode reward: [(0, '0.235')] [2024-03-29 12:38:44,657][00501] Updated weights for policy 0, policy_version 5660 (0.0022) [2024-03-29 12:38:46,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 92815360. Throughput: 0: 41597.9. Samples: 93100420. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 12:38:46,686][00126] Avg episode reward: [(0, '0.282')] [2024-03-29 12:38:48,620][00501] Updated weights for policy 0, policy_version 5670 (0.0027) [2024-03-29 12:38:51,685][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 92995584. Throughput: 0: 41930.6. Samples: 93374860. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:38:51,687][00126] Avg episode reward: [(0, '0.278')] [2024-03-29 12:38:51,710][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005676_92995584.pth... [2024-03-29 12:38:52,057][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005065_82984960.pth [2024-03-29 12:38:53,111][00501] Updated weights for policy 0, policy_version 5680 (0.0027) [2024-03-29 12:38:56,254][00501] Updated weights for policy 0, policy_version 5690 (0.0023) [2024-03-29 12:38:56,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 93224960. Throughput: 0: 41849.5. Samples: 93482740. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:38:56,686][00126] Avg episode reward: [(0, '0.220')] [2024-03-29 12:39:00,290][00501] Updated weights for policy 0, policy_version 5700 (0.0024) [2024-03-29 12:39:01,685][00126] Fps is (10 sec: 45875.5, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 93454336. Throughput: 0: 41443.6. Samples: 93722260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:39:01,686][00126] Avg episode reward: [(0, '0.226')] [2024-03-29 12:39:04,291][00501] Updated weights for policy 0, policy_version 5710 (0.0021) [2024-03-29 12:39:06,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 93618176. Throughput: 0: 41621.7. Samples: 94001080. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 12:39:06,686][00126] Avg episode reward: [(0, '0.246')] [2024-03-29 12:39:08,677][00501] Updated weights for policy 0, policy_version 5720 (0.0030) [2024-03-29 12:39:08,685][00481] Signal inference workers to stop experience collection... (3500 times) [2024-03-29 12:39:08,686][00481] Signal inference workers to resume experience collection... (3500 times) [2024-03-29 12:39:08,725][00501] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-03-29 12:39:08,725][00501] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-03-29 12:39:11,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41507.3, 300 sec: 41876.4). Total num frames: 93863936. Throughput: 0: 42169.9. Samples: 94116860. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 12:39:11,686][00126] Avg episode reward: [(0, '0.278')] [2024-03-29 12:39:11,832][00501] Updated weights for policy 0, policy_version 5730 (0.0030) [2024-03-29 12:39:16,000][00501] Updated weights for policy 0, policy_version 5740 (0.0030) [2024-03-29 12:39:16,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 94076928. Throughput: 0: 41788.6. Samples: 94353980. Policy #0 lag: (min: 2.0, avg: 23.1, max: 42.0) [2024-03-29 12:39:16,686][00126] Avg episode reward: [(0, '0.219')] [2024-03-29 12:39:20,038][00501] Updated weights for policy 0, policy_version 5750 (0.0017) [2024-03-29 12:39:21,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 94257152. Throughput: 0: 41425.7. Samples: 94622340. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 12:39:21,686][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 12:39:24,232][00501] Updated weights for policy 0, policy_version 5760 (0.0019) [2024-03-29 12:39:26,685][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.3, 300 sec: 41876.4). Total num frames: 94502912. Throughput: 0: 42096.5. Samples: 94745660. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 12:39:26,686][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 12:39:27,296][00501] Updated weights for policy 0, policy_version 5770 (0.0025) [2024-03-29 12:39:31,417][00501] Updated weights for policy 0, policy_version 5780 (0.0027) [2024-03-29 12:39:31,685][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.5, 300 sec: 41931.9). Total num frames: 94715904. Throughput: 0: 41910.8. Samples: 94986400. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:39:31,686][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 12:39:35,518][00501] Updated weights for policy 0, policy_version 5790 (0.0022) [2024-03-29 12:39:36,685][00126] Fps is (10 sec: 39321.1, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 94896128. Throughput: 0: 41928.9. Samples: 95261660. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:39:36,686][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 12:39:36,915][00481] Saving new best policy, reward=0.336! [2024-03-29 12:39:39,765][00501] Updated weights for policy 0, policy_version 5800 (0.0020) [2024-03-29 12:39:40,801][00481] Signal inference workers to stop experience collection... (3550 times) [2024-03-29 12:39:40,838][00501] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-03-29 12:39:41,014][00481] Signal inference workers to resume experience collection... (3550 times) [2024-03-29 12:39:41,014][00501] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-03-29 12:39:41,686][00126] Fps is (10 sec: 40959.0, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 95125504. Throughput: 0: 42180.8. Samples: 95380880. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 12:39:41,686][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 12:39:42,997][00501] Updated weights for policy 0, policy_version 5810 (0.0021) [2024-03-29 12:39:46,686][00126] Fps is (10 sec: 44232.8, 60 sec: 42051.6, 300 sec: 41876.3). Total num frames: 95338496. Throughput: 0: 42022.2. Samples: 95613300. Policy #0 lag: (min: 2.0, avg: 22.2, max: 42.0) [2024-03-29 12:39:46,687][00126] Avg episode reward: [(0, '0.236')] [2024-03-29 12:39:46,969][00501] Updated weights for policy 0, policy_version 5820 (0.0030) [2024-03-29 12:39:50,888][00501] Updated weights for policy 0, policy_version 5830 (0.0027) [2024-03-29 12:39:51,685][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 95535104. Throughput: 0: 41870.1. Samples: 95885240. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 12:39:51,686][00126] Avg episode reward: [(0, '0.226')] [2024-03-29 12:39:55,295][00501] Updated weights for policy 0, policy_version 5840 (0.0027) [2024-03-29 12:39:56,685][00126] Fps is (10 sec: 40963.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 95748096. Throughput: 0: 42349.8. Samples: 96022600. Policy #0 lag: (min: 1.0, avg: 17.8, max: 42.0) [2024-03-29 12:39:56,688][00126] Avg episode reward: [(0, '0.276')] [2024-03-29 12:39:58,584][00501] Updated weights for policy 0, policy_version 5850 (0.0025) [2024-03-29 12:40:01,685][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 95977472. Throughput: 0: 42093.8. Samples: 96248200. Policy #0 lag: (min: 2.0, avg: 22.7, max: 42.0) [2024-03-29 12:40:01,686][00126] Avg episode reward: [(0, '0.264')] [2024-03-29 12:40:02,300][00501] Updated weights for policy 0, policy_version 5860 (0.0018) [2024-03-29 12:40:06,155][00501] Updated weights for policy 0, policy_version 5870 (0.0022) [2024-03-29 12:40:06,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 96174080. Throughput: 0: 41999.0. Samples: 96512300. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 12:40:06,686][00126] Avg episode reward: [(0, '0.229')] [2024-03-29 12:40:10,689][00501] Updated weights for policy 0, policy_version 5880 (0.0025) [2024-03-29 12:40:11,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 96370688. Throughput: 0: 42629.6. Samples: 96664000. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 12:40:11,686][00126] Avg episode reward: [(0, '0.256')] [2024-03-29 12:40:13,351][00481] Signal inference workers to stop experience collection... (3600 times) [2024-03-29 12:40:13,404][00501] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-03-29 12:40:13,444][00481] Signal inference workers to resume experience collection... (3600 times) [2024-03-29 12:40:13,449][00501] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-03-29 12:40:13,736][00501] Updated weights for policy 0, policy_version 5890 (0.0029) [2024-03-29 12:40:16,685][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 96616448. Throughput: 0: 42450.2. Samples: 96896660. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 12:40:16,686][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:40:16,686][00481] Saving new best policy, reward=0.358! [2024-03-29 12:40:17,841][00501] Updated weights for policy 0, policy_version 5900 (0.0021) [2024-03-29 12:40:21,684][00501] Updated weights for policy 0, policy_version 5910 (0.0026) [2024-03-29 12:40:21,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 96829440. Throughput: 0: 41862.6. Samples: 97145480. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:40:21,686][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 12:40:26,219][00501] Updated weights for policy 0, policy_version 5920 (0.0027) [2024-03-29 12:40:26,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 97009664. Throughput: 0: 42615.6. Samples: 97298580. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 12:40:26,686][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 12:40:29,275][00501] Updated weights for policy 0, policy_version 5930 (0.0018) [2024-03-29 12:40:31,685][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 97239040. Throughput: 0: 42362.7. Samples: 97519580. Policy #0 lag: (min: 3.0, avg: 22.7, max: 43.0) [2024-03-29 12:40:31,688][00126] Avg episode reward: [(0, '0.259')] [2024-03-29 12:40:33,239][00501] Updated weights for policy 0, policy_version 5940 (0.0028) [2024-03-29 12:40:36,685][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 97452032. Throughput: 0: 42114.7. Samples: 97780400. Policy #0 lag: (min: 0.0, avg: 22.6, max: 42.0) [2024-03-29 12:40:36,686][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 12:40:37,282][00501] Updated weights for policy 0, policy_version 5950 (0.0018) [2024-03-29 12:40:41,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 97632256. Throughput: 0: 42193.7. Samples: 97921320. Policy #0 lag: (min: 1.0, avg: 18.1, max: 43.0) [2024-03-29 12:40:41,686][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 12:40:41,831][00501] Updated weights for policy 0, policy_version 5960 (0.0021) [2024-03-29 12:40:45,159][00501] Updated weights for policy 0, policy_version 5970 (0.0026) [2024-03-29 12:40:46,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.9, 300 sec: 41931.9). Total num frames: 97878016. Throughput: 0: 42322.6. Samples: 98152720. Policy #0 lag: (min: 1.0, avg: 18.1, max: 43.0) [2024-03-29 12:40:46,686][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 12:40:46,917][00481] Signal inference workers to stop experience collection... (3650 times) [2024-03-29 12:40:46,988][00501] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-03-29 12:40:46,994][00481] Signal inference workers to resume experience collection... (3650 times) [2024-03-29 12:40:47,014][00501] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-03-29 12:40:48,770][00501] Updated weights for policy 0, policy_version 5980 (0.0019) [2024-03-29 12:40:51,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 98074624. Throughput: 0: 42135.6. Samples: 98408400. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:40:51,686][00126] Avg episode reward: [(0, '0.266')] [2024-03-29 12:40:51,818][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005987_98091008.pth... [2024-03-29 12:40:52,160][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005373_88031232.pth [2024-03-29 12:40:52,975][00501] Updated weights for policy 0, policy_version 5990 (0.0021) [2024-03-29 12:40:56,685][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 98238464. Throughput: 0: 41657.5. Samples: 98538580. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 12:40:56,686][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 12:40:57,639][00501] Updated weights for policy 0, policy_version 6000 (0.0029) [2024-03-29 12:41:01,004][00501] Updated weights for policy 0, policy_version 6010 (0.0019) [2024-03-29 12:41:01,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 98500608. Throughput: 0: 41936.3. Samples: 98783800. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 12:41:01,686][00126] Avg episode reward: [(0, '0.271')] [2024-03-29 12:41:04,575][00501] Updated weights for policy 0, policy_version 6020 (0.0019) [2024-03-29 12:41:06,685][00126] Fps is (10 sec: 47512.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 98713600. Throughput: 0: 41786.2. Samples: 99025860. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:41:06,686][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 12:41:08,456][00501] Updated weights for policy 0, policy_version 6030 (0.0021) [2024-03-29 12:41:11,685][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 98877440. Throughput: 0: 41613.0. Samples: 99171160. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 12:41:11,686][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 12:41:13,038][00501] Updated weights for policy 0, policy_version 6040 (0.0032) [2024-03-29 12:41:16,330][00501] Updated weights for policy 0, policy_version 6050 (0.0018) [2024-03-29 12:41:16,685][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 99123200. Throughput: 0: 42301.3. Samples: 99423140. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 12:41:16,686][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 12:41:19,793][00481] Signal inference workers to stop experience collection... (3700 times) [2024-03-29 12:41:19,830][00501] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-03-29 12:41:20,010][00481] Signal inference workers to resume experience collection... (3700 times) [2024-03-29 12:41:20,010][00501] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-03-29 12:41:20,013][00501] Updated weights for policy 0, policy_version 6060 (0.0019) [2024-03-29 12:41:21,685][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 99352576. Throughput: 0: 41689.4. Samples: 99656420. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 12:41:21,686][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 12:41:24,169][00501] Updated weights for policy 0, policy_version 6070 (0.0023) [2024-03-29 12:41:26,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 99516416. Throughput: 0: 41664.4. Samples: 99796220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 12:41:26,686][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 12:41:28,660][00501] Updated weights for policy 0, policy_version 6080 (0.0022) [2024-03-29 12:41:31,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 99762176. Throughput: 0: 42310.3. Samples: 100056680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:41:31,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:41:31,749][00501] Updated weights for policy 0, policy_version 6090 (0.0023) [2024-03-29 12:41:35,457][00501] Updated weights for policy 0, policy_version 6100 (0.0019) [2024-03-29 12:41:36,685][00126] Fps is (10 sec: 47513.4, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 99991552. Throughput: 0: 41836.4. Samples: 100291040. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 12:41:36,686][00126] Avg episode reward: [(0, '0.245')] [2024-03-29 12:41:39,562][00501] Updated weights for policy 0, policy_version 6110 (0.0018) [2024-03-29 12:41:41,685][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 100155392. Throughput: 0: 42064.8. Samples: 100431500. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:41:41,686][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 12:41:44,116][00501] Updated weights for policy 0, policy_version 6120 (0.0019) [2024-03-29 12:41:46,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 100384768. Throughput: 0: 42465.0. Samples: 100694720. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:41:46,686][00126] Avg episode reward: [(0, '0.288')] [2024-03-29 12:41:47,411][00501] Updated weights for policy 0, policy_version 6130 (0.0023) [2024-03-29 12:41:50,968][00501] Updated weights for policy 0, policy_version 6140 (0.0024) [2024-03-29 12:41:51,685][00126] Fps is (10 sec: 45875.4, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 100614144. Throughput: 0: 42256.5. Samples: 100927400. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 12:41:51,686][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 12:41:54,728][00481] Signal inference workers to stop experience collection... (3750 times) [2024-03-29 12:41:54,728][00481] Signal inference workers to resume experience collection... (3750 times) [2024-03-29 12:41:54,765][00501] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-03-29 12:41:54,765][00501] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-03-29 12:41:55,028][00501] Updated weights for policy 0, policy_version 6150 (0.0020) [2024-03-29 12:41:56,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 100810752. Throughput: 0: 42025.3. Samples: 101062300. Policy #0 lag: (min: 0.0, avg: 22.6, max: 41.0) [2024-03-29 12:41:56,686][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 12:41:59,677][00501] Updated weights for policy 0, policy_version 6160 (0.0019) [2024-03-29 12:42:01,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 101007360. Throughput: 0: 42219.0. Samples: 101323000. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 12:42:01,686][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 12:42:02,925][00501] Updated weights for policy 0, policy_version 6170 (0.0020) [2024-03-29 12:42:06,429][00501] Updated weights for policy 0, policy_version 6180 (0.0023) [2024-03-29 12:42:06,685][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 101253120. Throughput: 0: 42331.9. Samples: 101561360. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 12:42:06,686][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 12:42:10,566][00501] Updated weights for policy 0, policy_version 6190 (0.0028) [2024-03-29 12:42:11,685][00126] Fps is (10 sec: 44237.3, 60 sec: 42871.5, 300 sec: 42098.5). Total num frames: 101449728. Throughput: 0: 42138.2. Samples: 101692440. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:42:11,686][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 12:42:15,175][00501] Updated weights for policy 0, policy_version 6200 (0.0018) [2024-03-29 12:42:16,685][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 101646336. Throughput: 0: 42355.6. Samples: 101962680. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 12:42:16,687][00126] Avg episode reward: [(0, '0.274')] [2024-03-29 12:42:18,447][00501] Updated weights for policy 0, policy_version 6210 (0.0021) [2024-03-29 12:42:21,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 101875712. Throughput: 0: 42284.4. Samples: 102193840. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 12:42:21,686][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 12:42:21,797][00481] Saving new best policy, reward=0.383! [2024-03-29 12:42:22,504][00501] Updated weights for policy 0, policy_version 6220 (0.0018) [2024-03-29 12:42:26,360][00501] Updated weights for policy 0, policy_version 6230 (0.0018) [2024-03-29 12:42:26,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 102072320. Throughput: 0: 41860.5. Samples: 102315220. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 12:42:26,686][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 12:42:29,902][00481] Signal inference workers to stop experience collection... (3800 times) [2024-03-29 12:42:29,904][00481] Signal inference workers to resume experience collection... (3800 times) [2024-03-29 12:42:29,949][00501] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-03-29 12:42:29,950][00501] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-03-29 12:42:30,893][00501] Updated weights for policy 0, policy_version 6240 (0.0025) [2024-03-29 12:42:31,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 102268928. Throughput: 0: 42155.0. Samples: 102591700. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 12:42:31,686][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 12:42:34,054][00501] Updated weights for policy 0, policy_version 6250 (0.0021) [2024-03-29 12:42:36,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 102498304. Throughput: 0: 42220.0. Samples: 102827300. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:42:36,686][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 12:42:37,699][00501] Updated weights for policy 0, policy_version 6260 (0.0020) [2024-03-29 12:42:41,685][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 102711296. Throughput: 0: 41916.8. Samples: 102948560. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:42:41,688][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 12:42:41,745][00501] Updated weights for policy 0, policy_version 6270 (0.0022) [2024-03-29 12:42:46,534][00501] Updated weights for policy 0, policy_version 6280 (0.0019) [2024-03-29 12:42:46,685][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 102891520. Throughput: 0: 42225.9. Samples: 103223160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:42:46,686][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 12:42:49,726][00501] Updated weights for policy 0, policy_version 6290 (0.0019) [2024-03-29 12:42:51,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 103120896. Throughput: 0: 41963.1. Samples: 103449700. Policy #0 lag: (min: 1.0, avg: 19.4, max: 42.0) [2024-03-29 12:42:51,686][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 12:42:51,837][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006295_103137280.pth... [2024-03-29 12:42:52,189][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005676_92995584.pth [2024-03-29 12:42:53,576][00501] Updated weights for policy 0, policy_version 6300 (0.0023) [2024-03-29 12:42:56,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 103317504. Throughput: 0: 41496.5. Samples: 103559780. Policy #0 lag: (min: 2.0, avg: 22.0, max: 42.0) [2024-03-29 12:42:56,686][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 12:42:57,738][00501] Updated weights for policy 0, policy_version 6310 (0.0026) [2024-03-29 12:43:01,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 103497728. Throughput: 0: 41743.1. Samples: 103841120. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 12:43:01,686][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 12:43:02,496][00481] Signal inference workers to stop experience collection... (3850 times) [2024-03-29 12:43:02,546][00501] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-03-29 12:43:02,582][00481] Signal inference workers to resume experience collection... (3850 times) [2024-03-29 12:43:02,590][00501] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-03-29 12:43:02,593][00501] Updated weights for policy 0, policy_version 6320 (0.0026) [2024-03-29 12:43:05,730][00501] Updated weights for policy 0, policy_version 6330 (0.0020) [2024-03-29 12:43:06,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41987.7). Total num frames: 103759872. Throughput: 0: 41577.4. Samples: 104064820. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 12:43:06,686][00126] Avg episode reward: [(0, '0.257')] [2024-03-29 12:43:09,286][00501] Updated weights for policy 0, policy_version 6340 (0.0028) [2024-03-29 12:43:11,685][00126] Fps is (10 sec: 45875.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 103956480. Throughput: 0: 41798.7. Samples: 104196160. Policy #0 lag: (min: 0.0, avg: 23.1, max: 43.0) [2024-03-29 12:43:11,686][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 12:43:13,408][00501] Updated weights for policy 0, policy_version 6350 (0.0018) [2024-03-29 12:43:16,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 104136704. Throughput: 0: 41726.7. Samples: 104469400. Policy #0 lag: (min: 0.0, avg: 23.1, max: 43.0) [2024-03-29 12:43:16,688][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 12:43:17,914][00501] Updated weights for policy 0, policy_version 6360 (0.0027) [2024-03-29 12:43:21,071][00501] Updated weights for policy 0, policy_version 6370 (0.0026) [2024-03-29 12:43:21,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 104382464. Throughput: 0: 41700.0. Samples: 104703800. Policy #0 lag: (min: 1.0, avg: 18.9, max: 43.0) [2024-03-29 12:43:21,686][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 12:43:24,775][00501] Updated weights for policy 0, policy_version 6380 (0.0026) [2024-03-29 12:43:26,686][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 104595456. Throughput: 0: 42034.2. Samples: 104840100. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 12:43:26,686][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 12:43:29,136][00501] Updated weights for policy 0, policy_version 6390 (0.0027) [2024-03-29 12:43:31,686][00126] Fps is (10 sec: 37682.8, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 104759296. Throughput: 0: 41324.3. Samples: 105082760. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 12:43:31,686][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 12:43:33,605][00501] Updated weights for policy 0, policy_version 6400 (0.0018) [2024-03-29 12:43:35,595][00481] Signal inference workers to stop experience collection... (3900 times) [2024-03-29 12:43:35,625][00501] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-03-29 12:43:35,784][00481] Signal inference workers to resume experience collection... (3900 times) [2024-03-29 12:43:35,784][00501] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-03-29 12:43:36,612][00501] Updated weights for policy 0, policy_version 6410 (0.0023) [2024-03-29 12:43:36,685][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 105021440. Throughput: 0: 41853.4. Samples: 105333100. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 12:43:36,686][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 12:43:40,440][00501] Updated weights for policy 0, policy_version 6420 (0.0034) [2024-03-29 12:43:41,685][00126] Fps is (10 sec: 49152.0, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 105250816. Throughput: 0: 42437.6. Samples: 105469480. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 12:43:41,686][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 12:43:44,513][00501] Updated weights for policy 0, policy_version 6430 (0.0023) [2024-03-29 12:43:46,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 105398272. Throughput: 0: 41695.1. Samples: 105717400. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 12:43:46,688][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 12:43:49,039][00501] Updated weights for policy 0, policy_version 6440 (0.0027) [2024-03-29 12:43:51,685][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 105644032. Throughput: 0: 42540.8. Samples: 105979160. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 12:43:51,686][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 12:43:52,144][00481] Saving new best policy, reward=0.417! [2024-03-29 12:43:52,162][00501] Updated weights for policy 0, policy_version 6450 (0.0026) [2024-03-29 12:43:56,088][00501] Updated weights for policy 0, policy_version 6460 (0.0022) [2024-03-29 12:43:56,685][00126] Fps is (10 sec: 47513.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 105873408. Throughput: 0: 42129.4. Samples: 106091980. Policy #0 lag: (min: 2.0, avg: 22.6, max: 41.0) [2024-03-29 12:43:56,686][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 12:44:00,353][00501] Updated weights for policy 0, policy_version 6470 (0.0025) [2024-03-29 12:44:01,685][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 106053632. Throughput: 0: 41571.5. Samples: 106340120. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:44:01,686][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 12:44:04,806][00501] Updated weights for policy 0, policy_version 6480 (0.0018) [2024-03-29 12:44:06,685][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 106250240. Throughput: 0: 42265.7. Samples: 106605760. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 12:44:06,686][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 12:44:08,047][00501] Updated weights for policy 0, policy_version 6490 (0.0025) [2024-03-29 12:44:11,256][00481] Signal inference workers to stop experience collection... (3950 times) [2024-03-29 12:44:11,306][00501] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-03-29 12:44:11,343][00481] Signal inference workers to resume experience collection... (3950 times) [2024-03-29 12:44:11,346][00501] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-03-29 12:44:11,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 106479616. Throughput: 0: 41686.3. Samples: 106715980. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:44:11,686][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 12:44:11,896][00501] Updated weights for policy 0, policy_version 6500 (0.0024) [2024-03-29 12:44:16,063][00501] Updated weights for policy 0, policy_version 6510 (0.0021) [2024-03-29 12:44:16,685][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 106692608. Throughput: 0: 41967.7. Samples: 106971300. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:44:16,686][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 12:44:20,537][00501] Updated weights for policy 0, policy_version 6520 (0.0019) [2024-03-29 12:44:21,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 106872832. Throughput: 0: 41835.5. Samples: 107215700. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 12:44:21,686][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 12:44:23,893][00501] Updated weights for policy 0, policy_version 6530 (0.0034) [2024-03-29 12:44:26,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 107085824. Throughput: 0: 41379.2. Samples: 107331540. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 12:44:26,686][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 12:44:27,838][00501] Updated weights for policy 0, policy_version 6540 (0.0028) [2024-03-29 12:44:31,685][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 107282432. Throughput: 0: 41248.4. Samples: 107573580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 12:44:31,686][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 12:44:32,161][00501] Updated weights for policy 0, policy_version 6550 (0.0021) [2024-03-29 12:44:36,367][00501] Updated weights for policy 0, policy_version 6560 (0.0022) [2024-03-29 12:44:36,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.2, 300 sec: 41932.0). Total num frames: 107495424. Throughput: 0: 41295.2. Samples: 107837440. Policy #0 lag: (min: 2.0, avg: 19.0, max: 42.0) [2024-03-29 12:44:36,686][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 12:44:39,656][00501] Updated weights for policy 0, policy_version 6570 (0.0024) [2024-03-29 12:44:41,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41233.2, 300 sec: 41987.6). Total num frames: 107724800. Throughput: 0: 41388.8. Samples: 107954480. Policy #0 lag: (min: 2.0, avg: 19.0, max: 42.0) [2024-03-29 12:44:41,686][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 12:44:43,002][00481] Signal inference workers to stop experience collection... (4000 times) [2024-03-29 12:44:43,078][00501] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-03-29 12:44:43,082][00481] Signal inference workers to resume experience collection... (4000 times) [2024-03-29 12:44:43,106][00501] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-03-29 12:44:43,660][00501] Updated weights for policy 0, policy_version 6580 (0.0022) [2024-03-29 12:44:46,685][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 107921408. Throughput: 0: 41432.5. Samples: 108204580. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 12:44:46,686][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 12:44:47,660][00501] Updated weights for policy 0, policy_version 6590 (0.0023) [2024-03-29 12:44:51,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 108118016. Throughput: 0: 41624.1. Samples: 108478840. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:44:51,686][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 12:44:51,948][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006600_108134400.pth... [2024-03-29 12:44:51,949][00501] Updated weights for policy 0, policy_version 6600 (0.0023) [2024-03-29 12:44:52,362][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005987_98091008.pth [2024-03-29 12:44:55,473][00501] Updated weights for policy 0, policy_version 6610 (0.0024) [2024-03-29 12:44:56,687][00126] Fps is (10 sec: 42589.5, 60 sec: 41231.6, 300 sec: 41931.6). Total num frames: 108347392. Throughput: 0: 41733.7. Samples: 108594080. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 12:44:56,688][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 12:44:59,348][00501] Updated weights for policy 0, policy_version 6620 (0.0030) [2024-03-29 12:45:01,685][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 108560384. Throughput: 0: 41150.2. Samples: 108823060. Policy #0 lag: (min: 0.0, avg: 22.7, max: 44.0) [2024-03-29 12:45:01,686][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 12:45:03,262][00501] Updated weights for policy 0, policy_version 6630 (0.0030) [2024-03-29 12:45:06,685][00126] Fps is (10 sec: 37690.7, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 108724224. Throughput: 0: 42070.3. Samples: 109108860. Policy #0 lag: (min: 0.0, avg: 22.7, max: 44.0) [2024-03-29 12:45:06,687][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 12:45:07,693][00501] Updated weights for policy 0, policy_version 6640 (0.0020) [2024-03-29 12:45:11,201][00501] Updated weights for policy 0, policy_version 6650 (0.0034) [2024-03-29 12:45:11,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 108969984. Throughput: 0: 41991.6. Samples: 109221160. Policy #0 lag: (min: 1.0, avg: 17.5, max: 40.0) [2024-03-29 12:45:11,686][00126] Avg episode reward: [(0, '0.281')] [2024-03-29 12:45:14,067][00481] Signal inference workers to stop experience collection... (4050 times) [2024-03-29 12:45:14,103][00501] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-03-29 12:45:14,292][00481] Signal inference workers to resume experience collection... (4050 times) [2024-03-29 12:45:14,293][00501] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-03-29 12:45:15,268][00501] Updated weights for policy 0, policy_version 6660 (0.0023) [2024-03-29 12:45:16,685][00126] Fps is (10 sec: 45875.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 109182976. Throughput: 0: 41699.6. Samples: 109450060. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:45:16,686][00126] Avg episode reward: [(0, '0.278')] [2024-03-29 12:45:19,150][00501] Updated weights for policy 0, policy_version 6670 (0.0019) [2024-03-29 12:45:21,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 109346816. Throughput: 0: 41938.2. Samples: 109724660. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 12:45:21,686][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 12:45:23,348][00501] Updated weights for policy 0, policy_version 6680 (0.0025) [2024-03-29 12:45:26,685][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 109592576. Throughput: 0: 42120.4. Samples: 109849900. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 12:45:26,686][00126] Avg episode reward: [(0, '0.268')] [2024-03-29 12:45:26,721][00501] Updated weights for policy 0, policy_version 6690 (0.0034) [2024-03-29 12:45:30,888][00501] Updated weights for policy 0, policy_version 6700 (0.0029) [2024-03-29 12:45:31,685][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 109805568. Throughput: 0: 41810.6. Samples: 110086060. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 12:45:31,686][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 12:45:34,864][00501] Updated weights for policy 0, policy_version 6710 (0.0023) [2024-03-29 12:45:36,686][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 109985792. Throughput: 0: 41364.3. Samples: 110340240. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 12:45:36,686][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 12:45:39,032][00501] Updated weights for policy 0, policy_version 6720 (0.0023) [2024-03-29 12:45:41,685][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 110215168. Throughput: 0: 41481.6. Samples: 110460660. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 12:45:41,686][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 12:45:42,523][00501] Updated weights for policy 0, policy_version 6730 (0.0018) [2024-03-29 12:45:42,584][00481] Signal inference workers to stop experience collection... (4100 times) [2024-03-29 12:45:42,616][00501] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-03-29 12:45:42,778][00481] Signal inference workers to resume experience collection... (4100 times) [2024-03-29 12:45:42,779][00501] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-03-29 12:45:46,596][00501] Updated weights for policy 0, policy_version 6740 (0.0021) [2024-03-29 12:45:46,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 110428160. Throughput: 0: 42033.7. Samples: 110714580. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 12:45:46,686][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 12:45:50,623][00501] Updated weights for policy 0, policy_version 6750 (0.0026) [2024-03-29 12:45:51,685][00126] Fps is (10 sec: 40959.1, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 110624768. Throughput: 0: 41097.8. Samples: 110958260. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 12:45:51,687][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 12:45:54,870][00501] Updated weights for policy 0, policy_version 6760 (0.0020) [2024-03-29 12:45:56,685][00126] Fps is (10 sec: 37683.8, 60 sec: 40961.5, 300 sec: 41709.8). Total num frames: 110804992. Throughput: 0: 41648.1. Samples: 111095320. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 12:45:56,686][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 12:45:58,393][00501] Updated weights for policy 0, policy_version 6770 (0.0026) [2024-03-29 12:46:01,685][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 111034368. Throughput: 0: 41725.3. Samples: 111327700. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 12:46:01,686][00126] Avg episode reward: [(0, '0.243')] [2024-03-29 12:46:02,441][00501] Updated weights for policy 0, policy_version 6780 (0.0024) [2024-03-29 12:46:06,331][00501] Updated weights for policy 0, policy_version 6790 (0.0019) [2024-03-29 12:46:06,685][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 111263744. Throughput: 0: 41012.0. Samples: 111570200. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 12:46:06,686][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 12:46:10,799][00501] Updated weights for policy 0, policy_version 6800 (0.0021) [2024-03-29 12:46:11,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 111443968. Throughput: 0: 41629.0. Samples: 111723200. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 12:46:11,686][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 12:46:14,353][00501] Updated weights for policy 0, policy_version 6810 (0.0022) [2024-03-29 12:46:16,144][00481] Signal inference workers to stop experience collection... (4150 times) [2024-03-29 12:46:16,199][00501] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-03-29 12:46:16,242][00481] Signal inference workers to resume experience collection... (4150 times) [2024-03-29 12:46:16,245][00501] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-03-29 12:46:16,685][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 111656960. Throughput: 0: 41122.8. Samples: 111936580. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:46:16,686][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:46:18,299][00501] Updated weights for policy 0, policy_version 6820 (0.0027) [2024-03-29 12:46:21,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 111853568. Throughput: 0: 41329.4. Samples: 112200060. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:46:21,686][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 12:46:22,338][00501] Updated weights for policy 0, policy_version 6830 (0.0027) [2024-03-29 12:46:26,685][00126] Fps is (10 sec: 39321.2, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 112050176. Throughput: 0: 41514.1. Samples: 112328800. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 12:46:26,687][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 12:46:26,782][00501] Updated weights for policy 0, policy_version 6840 (0.0031) [2024-03-29 12:46:30,297][00501] Updated weights for policy 0, policy_version 6850 (0.0027) [2024-03-29 12:46:31,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 112279552. Throughput: 0: 40793.9. Samples: 112550300. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 12:46:31,686][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 12:46:34,303][00501] Updated weights for policy 0, policy_version 6860 (0.0028) [2024-03-29 12:46:36,685][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 112492544. Throughput: 0: 40864.5. Samples: 112797160. Policy #0 lag: (min: 2.0, avg: 23.3, max: 42.0) [2024-03-29 12:46:36,688][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 12:46:38,534][00501] Updated weights for policy 0, policy_version 6870 (0.0023) [2024-03-29 12:46:41,685][00126] Fps is (10 sec: 37683.3, 60 sec: 40686.9, 300 sec: 41598.7). Total num frames: 112656384. Throughput: 0: 40898.2. Samples: 112935740. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 12:46:41,686][00126] Avg episode reward: [(0, '0.273')] [2024-03-29 12:46:42,699][00501] Updated weights for policy 0, policy_version 6880 (0.0026) [2024-03-29 12:46:46,183][00501] Updated weights for policy 0, policy_version 6890 (0.0019) [2024-03-29 12:46:46,685][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 112902144. Throughput: 0: 41324.4. Samples: 113187300. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 12:46:46,686][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 12:46:49,274][00481] Signal inference workers to stop experience collection... (4200 times) [2024-03-29 12:46:49,332][00501] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-03-29 12:46:49,364][00481] Signal inference workers to resume experience collection... (4200 times) [2024-03-29 12:46:49,368][00501] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-03-29 12:46:50,182][00501] Updated weights for policy 0, policy_version 6900 (0.0025) [2024-03-29 12:46:51,685][00126] Fps is (10 sec: 45874.9, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 113115136. Throughput: 0: 41140.0. Samples: 113421500. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:46:51,686][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 12:46:52,178][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006906_113147904.pth... [2024-03-29 12:46:52,518][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006295_103137280.pth [2024-03-29 12:46:54,469][00501] Updated weights for policy 0, policy_version 6910 (0.0027) [2024-03-29 12:46:56,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40959.9, 300 sec: 41543.2). Total num frames: 113262592. Throughput: 0: 40658.2. Samples: 113552820. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:46:56,686][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 12:46:58,591][00501] Updated weights for policy 0, policy_version 6920 (0.0028) [2024-03-29 12:47:01,685][00126] Fps is (10 sec: 37683.5, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 113491968. Throughput: 0: 41593.8. Samples: 113808300. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:47:01,686][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 12:47:02,218][00501] Updated weights for policy 0, policy_version 6930 (0.0030) [2024-03-29 12:47:06,280][00501] Updated weights for policy 0, policy_version 6940 (0.0028) [2024-03-29 12:47:06,685][00126] Fps is (10 sec: 45875.1, 60 sec: 40959.9, 300 sec: 41598.7). Total num frames: 113721344. Throughput: 0: 40714.6. Samples: 114032220. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:47:06,686][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 12:47:10,214][00501] Updated weights for policy 0, policy_version 6950 (0.0021) [2024-03-29 12:47:11,686][00126] Fps is (10 sec: 40959.3, 60 sec: 40959.9, 300 sec: 41543.1). Total num frames: 113901568. Throughput: 0: 40908.4. Samples: 114169680. Policy #0 lag: (min: 0.0, avg: 22.6, max: 42.0) [2024-03-29 12:47:11,688][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 12:47:14,407][00501] Updated weights for policy 0, policy_version 6960 (0.0032) [2024-03-29 12:47:16,685][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 114114560. Throughput: 0: 41639.1. Samples: 114424060. Policy #0 lag: (min: 2.0, avg: 18.9, max: 42.0) [2024-03-29 12:47:16,686][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 12:47:17,111][00481] Signal inference workers to stop experience collection... (4250 times) [2024-03-29 12:47:17,184][00501] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-03-29 12:47:17,190][00481] Signal inference workers to resume experience collection... (4250 times) [2024-03-29 12:47:17,211][00501] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-03-29 12:47:18,195][00501] Updated weights for policy 0, policy_version 6970 (0.0023) [2024-03-29 12:47:21,685][00126] Fps is (10 sec: 42598.8, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 114327552. Throughput: 0: 41352.4. Samples: 114658020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 12:47:21,686][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 12:47:22,163][00501] Updated weights for policy 0, policy_version 6980 (0.0020) [2024-03-29 12:47:26,210][00501] Updated weights for policy 0, policy_version 6990 (0.0019) [2024-03-29 12:47:26,685][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 114524160. Throughput: 0: 41006.1. Samples: 114781020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 12:47:26,686][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 12:47:30,414][00501] Updated weights for policy 0, policy_version 7000 (0.0024) [2024-03-29 12:47:31,686][00126] Fps is (10 sec: 40959.5, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 114737152. Throughput: 0: 40939.0. Samples: 115029560. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:47:31,686][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 12:47:34,065][00501] Updated weights for policy 0, policy_version 7010 (0.0024) [2024-03-29 12:47:36,685][00126] Fps is (10 sec: 40960.0, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 114933760. Throughput: 0: 41256.8. Samples: 115278060. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 12:47:36,686][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 12:47:38,072][00501] Updated weights for policy 0, policy_version 7020 (0.0022) [2024-03-29 12:47:41,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.0, 300 sec: 41543.1). Total num frames: 115146752. Throughput: 0: 40964.0. Samples: 115396200. Policy #0 lag: (min: 0.0, avg: 22.7, max: 43.0) [2024-03-29 12:47:41,686][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 12:47:42,075][00501] Updated weights for policy 0, policy_version 7030 (0.0033) [2024-03-29 12:47:46,130][00481] Signal inference workers to stop experience collection... (4300 times) [2024-03-29 12:47:46,168][00501] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-03-29 12:47:46,355][00481] Signal inference workers to resume experience collection... (4300 times) [2024-03-29 12:47:46,356][00501] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-03-29 12:47:46,359][00501] Updated weights for policy 0, policy_version 7040 (0.0027) [2024-03-29 12:47:46,685][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 115359744. Throughput: 0: 41327.0. Samples: 115668020. Policy #0 lag: (min: 2.0, avg: 19.1, max: 43.0) [2024-03-29 12:47:46,686][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 12:47:49,886][00501] Updated weights for policy 0, policy_version 7050 (0.0034) [2024-03-29 12:47:51,685][00126] Fps is (10 sec: 42598.7, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 115572736. Throughput: 0: 41250.7. Samples: 115888500. Policy #0 lag: (min: 2.0, avg: 19.1, max: 43.0) [2024-03-29 12:47:51,686][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 12:47:54,025][00501] Updated weights for policy 0, policy_version 7060 (0.0024) [2024-03-29 12:47:56,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 115769344. Throughput: 0: 40888.6. Samples: 116009660. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 12:47:56,686][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 12:47:58,154][00501] Updated weights for policy 0, policy_version 7070 (0.0033) [2024-03-29 12:48:01,685][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41321.0). Total num frames: 115949568. Throughput: 0: 41446.3. Samples: 116289140. Policy #0 lag: (min: 1.0, avg: 20.0, max: 43.0) [2024-03-29 12:48:01,686][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 12:48:02,298][00501] Updated weights for policy 0, policy_version 7080 (0.0033) [2024-03-29 12:48:05,821][00501] Updated weights for policy 0, policy_version 7090 (0.0026) [2024-03-29 12:48:06,685][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 116211712. Throughput: 0: 41049.8. Samples: 116505260. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:48:06,686][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 12:48:09,624][00501] Updated weights for policy 0, policy_version 7100 (0.0027) [2024-03-29 12:48:11,686][00126] Fps is (10 sec: 45874.4, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 116408320. Throughput: 0: 41156.8. Samples: 116633080. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:48:11,686][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 12:48:13,899][00501] Updated weights for policy 0, policy_version 7110 (0.0029) [2024-03-29 12:48:16,685][00126] Fps is (10 sec: 34406.5, 60 sec: 40687.0, 300 sec: 41265.5). Total num frames: 116555776. Throughput: 0: 41631.7. Samples: 116902980. Policy #0 lag: (min: 0.0, avg: 22.2, max: 43.0) [2024-03-29 12:48:16,686][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:48:18,120][00501] Updated weights for policy 0, policy_version 7120 (0.0031) [2024-03-29 12:48:20,831][00481] Signal inference workers to stop experience collection... (4350 times) [2024-03-29 12:48:20,865][00501] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-03-29 12:48:21,016][00481] Signal inference workers to resume experience collection... (4350 times) [2024-03-29 12:48:21,016][00501] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-03-29 12:48:21,598][00501] Updated weights for policy 0, policy_version 7130 (0.0025) [2024-03-29 12:48:21,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 116817920. Throughput: 0: 41298.2. Samples: 117136480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 12:48:21,686][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 12:48:25,449][00501] Updated weights for policy 0, policy_version 7140 (0.0019) [2024-03-29 12:48:26,685][00126] Fps is (10 sec: 47513.6, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 117030912. Throughput: 0: 41693.9. Samples: 117272420. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:48:26,686][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 12:48:29,561][00501] Updated weights for policy 0, policy_version 7150 (0.0027) [2024-03-29 12:48:31,686][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 41321.0). Total num frames: 117211136. Throughput: 0: 41568.0. Samples: 117538580. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:48:31,687][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 12:48:33,560][00501] Updated weights for policy 0, policy_version 7160 (0.0028) [2024-03-29 12:48:36,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.3, 300 sec: 41321.0). Total num frames: 117440512. Throughput: 0: 41921.4. Samples: 117774960. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 12:48:36,686][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 12:48:37,026][00501] Updated weights for policy 0, policy_version 7170 (0.0024) [2024-03-29 12:48:40,966][00501] Updated weights for policy 0, policy_version 7180 (0.0022) [2024-03-29 12:48:41,685][00126] Fps is (10 sec: 45876.0, 60 sec: 42052.4, 300 sec: 41598.7). Total num frames: 117669888. Throughput: 0: 42159.6. Samples: 117906840. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:48:41,686][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 12:48:45,347][00501] Updated weights for policy 0, policy_version 7190 (0.0030) [2024-03-29 12:48:46,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41376.6). Total num frames: 117850112. Throughput: 0: 41612.9. Samples: 118161720. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:48:46,686][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 12:48:49,383][00501] Updated weights for policy 0, policy_version 7200 (0.0030) [2024-03-29 12:48:51,478][00481] Signal inference workers to stop experience collection... (4400 times) [2024-03-29 12:48:51,479][00481] Signal inference workers to resume experience collection... (4400 times) [2024-03-29 12:48:51,516][00501] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-03-29 12:48:51,516][00501] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-03-29 12:48:51,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41321.0). Total num frames: 118063104. Throughput: 0: 42212.0. Samples: 118404800. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:48:51,686][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 12:48:51,781][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007207_118079488.pth... [2024-03-29 12:48:52,129][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006600_108134400.pth [2024-03-29 12:48:52,902][00501] Updated weights for policy 0, policy_version 7210 (0.0024) [2024-03-29 12:48:56,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 118276096. Throughput: 0: 41720.2. Samples: 118510480. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 12:48:56,688][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 12:48:56,876][00501] Updated weights for policy 0, policy_version 7220 (0.0029) [2024-03-29 12:49:01,274][00501] Updated weights for policy 0, policy_version 7230 (0.0021) [2024-03-29 12:49:01,685][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 118456320. Throughput: 0: 41520.9. Samples: 118771420. Policy #0 lag: (min: 0.0, avg: 22.6, max: 42.0) [2024-03-29 12:49:01,686][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 12:49:05,370][00501] Updated weights for policy 0, policy_version 7240 (0.0022) [2024-03-29 12:49:06,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 118685696. Throughput: 0: 41855.6. Samples: 119019980. Policy #0 lag: (min: 1.0, avg: 19.9, max: 44.0) [2024-03-29 12:49:06,686][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 12:49:08,627][00501] Updated weights for policy 0, policy_version 7250 (0.0019) [2024-03-29 12:49:11,685][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41321.0). Total num frames: 118882304. Throughput: 0: 41449.7. Samples: 119137660. Policy #0 lag: (min: 1.0, avg: 19.9, max: 44.0) [2024-03-29 12:49:11,686][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 12:49:12,624][00501] Updated weights for policy 0, policy_version 7260 (0.0019) [2024-03-29 12:49:16,685][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.2, 300 sec: 41432.1). Total num frames: 119095296. Throughput: 0: 41324.5. Samples: 119398180. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 12:49:16,686][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 12:49:16,743][00501] Updated weights for policy 0, policy_version 7270 (0.0023) [2024-03-29 12:49:20,771][00501] Updated weights for policy 0, policy_version 7280 (0.0023) [2024-03-29 12:49:21,685][00126] Fps is (10 sec: 44237.0, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 119324672. Throughput: 0: 41782.2. Samples: 119655160. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 12:49:21,686][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 12:49:23,985][00501] Updated weights for policy 0, policy_version 7290 (0.0021) [2024-03-29 12:49:26,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 119521280. Throughput: 0: 41543.0. Samples: 119776280. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 12:49:26,688][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 12:49:26,849][00481] Signal inference workers to stop experience collection... (4450 times) [2024-03-29 12:49:26,874][00501] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-03-29 12:49:27,050][00481] Signal inference workers to resume experience collection... (4450 times) [2024-03-29 12:49:27,051][00501] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-03-29 12:49:28,062][00501] Updated weights for policy 0, policy_version 7300 (0.0019) [2024-03-29 12:57:08,281][00126] Saving configuration to /workspace/metta/train_dir/b.a20.20x20_40x40.norm/config.json... [2024-03-29 12:57:08,418][00126] Rollout worker 0 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 1 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 2 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 3 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 4 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 5 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 6 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 7 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 8 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 9 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 10 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 11 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 12 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 13 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 14 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 15 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 16 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 17 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 18 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 19 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 20 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 21 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 22 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 23 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 24 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 25 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 26 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 27 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 28 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 29 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 30 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 31 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 32 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 33 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 34 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 35 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 36 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 37 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 38 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 39 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 40 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 41 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 42 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 43 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 44 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 45 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 46 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 47 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 48 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 49 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 50 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 51 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 52 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 53 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 54 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 55 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 56 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 57 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 58 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 59 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 60 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 61 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 62 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 63 uses device cpu [2024-03-29 12:57:10,147][00126] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:10,148][00126] InferenceWorker_p0-w0: min num requests: 21 [2024-03-29 12:57:10,252][00126] Starting all processes... [2024-03-29 12:57:10,252][00126] Starting process learner_proc0 [2024-03-29 12:57:10,457][00126] Starting all processes... [2024-03-29 12:57:10,462][00126] Starting process inference_proc0-0 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc1 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc3 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc5 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc7 [2024-03-29 12:57:10,464][00126] Starting process rollout_proc9 [2024-03-29 12:57:10,467][00126] Starting process rollout_proc11 [2024-03-29 12:57:10,477][00126] Starting process rollout_proc13 [2024-03-29 12:57:10,477][00126] Starting process rollout_proc0 [2024-03-29 12:57:10,477][00126] Starting process rollout_proc2 [2024-03-29 12:57:10,479][00126] Starting process rollout_proc15 [2024-03-29 12:57:10,479][00126] Starting process rollout_proc17 [2024-03-29 12:57:10,487][00126] Starting process rollout_proc19 [2024-03-29 12:57:10,487][00126] Starting process rollout_proc21 [2024-03-29 12:57:10,489][00126] Starting process rollout_proc23 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc25 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc4 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc6 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc27 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc29 [2024-03-29 12:57:10,504][00126] Starting process rollout_proc8 [2024-03-29 12:57:10,514][00126] Starting process rollout_proc10 [2024-03-29 12:57:10,546][00126] Starting process rollout_proc12 [2024-03-29 12:57:10,560][00126] Starting process rollout_proc14 [2024-03-29 12:57:10,675][00126] Starting process rollout_proc31 [2024-03-29 12:57:10,675][00126] Starting process rollout_proc18 [2024-03-29 12:57:10,681][00126] Starting process rollout_proc16 [2024-03-29 12:57:10,681][00126] Starting process rollout_proc33 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc22 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc20 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc26 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc24 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc35 [2024-03-29 12:57:10,770][00126] Starting process rollout_proc30 [2024-03-29 12:57:10,790][00126] Starting process rollout_proc28 [2024-03-29 12:57:10,802][00126] Starting process rollout_proc37 [2024-03-29 12:57:10,879][00126] Starting process rollout_proc39 [2024-03-29 12:57:10,879][00126] Starting process rollout_proc41 [2024-03-29 12:57:10,879][00126] Starting process rollout_proc43 [2024-03-29 12:57:10,902][00126] Starting process rollout_proc45 [2024-03-29 12:57:10,902][00126] Starting process rollout_proc32 [2024-03-29 12:57:10,902][00126] Starting process rollout_proc47 [2024-03-29 12:57:10,926][00126] Starting process rollout_proc49 [2024-03-29 12:57:10,948][00126] Starting process rollout_proc34 [2024-03-29 12:57:10,967][00126] Starting process rollout_proc51 [2024-03-29 12:57:10,988][00126] Starting process rollout_proc53 [2024-03-29 12:57:11,013][00126] Starting process rollout_proc55 [2024-03-29 12:57:11,030][00126] Starting process rollout_proc57 [2024-03-29 12:57:11,053][00126] Starting process rollout_proc36 [2024-03-29 12:57:11,090][00126] Starting process rollout_proc59 [2024-03-29 12:57:11,112][00126] Starting process rollout_proc61 [2024-03-29 12:57:11,135][00126] Starting process rollout_proc38 [2024-03-29 12:57:11,166][00126] Starting process rollout_proc40 [2024-03-29 12:57:11,193][00126] Starting process rollout_proc42 [2024-03-29 12:57:11,318][00126] Starting process rollout_proc44 [2024-03-29 12:57:11,344][00126] Starting process rollout_proc62 [2024-03-29 12:57:11,403][00126] Starting process rollout_proc63 [2024-03-29 12:57:11,480][00126] Starting process rollout_proc50 [2024-03-29 12:57:11,508][00126] Starting process rollout_proc46 [2024-03-29 12:57:11,521][00126] Starting process rollout_proc60 [2024-03-29 12:57:11,558][00126] Starting process rollout_proc56 [2024-03-29 12:57:11,614][00126] Starting process rollout_proc52 [2024-03-29 12:57:11,615][00126] Starting process rollout_proc54 [2024-03-29 12:57:11,615][00126] Starting process rollout_proc58 [2024-03-29 12:57:11,615][00126] Starting process rollout_proc48 [2024-03-29 12:57:15,509][00500] Worker 7 uses CPU cores [7] [2024-03-29 12:57:15,553][00476] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:15,553][00476] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-03-29 12:57:15,575][00476] Num visible devices: 1 [2024-03-29 12:57:15,593][00499] Worker 5 uses CPU cores [5] [2024-03-29 12:57:15,638][00476] Starting seed is not provided [2024-03-29 12:57:15,638][00476] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:15,638][00476] Initializing actor-critic model on device cuda:0 [2024-03-29 12:57:15,639][00476] RunningMeanStd input shape: (20,) [2024-03-29 12:57:15,639][00476] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:57:15,640][00476] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:57:15,640][00476] RunningMeanStd input shape: (2,) [2024-03-29 12:57:15,640][00476] RunningMeanStd input shape: (1,) [2024-03-29 12:57:15,641][00476] RunningMeanStd input shape: (1,) [2024-03-29 12:57:15,648][00675] Worker 11 uses CPU cores [11] [2024-03-29 12:57:15,708][01076] Worker 25 uses CPU cores [25] [2024-03-29 12:57:15,709][01350] Worker 12 uses CPU cores [12] [2024-03-29 12:57:15,728][00756] Worker 2 uses CPU cores [2] [2024-03-29 12:57:15,729][00883] Worker 15 uses CPU cores [15] [2024-03-29 12:57:15,729][00496] Worker 1 uses CPU cores [1] [2024-03-29 12:57:15,741][00498] Worker 3 uses CPU cores [3] [2024-03-29 12:57:15,741][01656] Worker 16 uses CPU cores [16] [2024-03-29 12:57:15,741][00565] Worker 13 uses CPU cores [13] [2024-03-29 12:57:15,773][01141] Worker 23 uses CPU cores [23] [2024-03-29 12:57:15,773][01785] Worker 20 uses CPU cores [20] [2024-03-29 12:57:15,773][00564] Worker 9 uses CPU cores [9] [2024-03-29 12:57:15,785][00947] Worker 17 uses CPU cores [17] [2024-03-29 12:57:15,797][01786] Worker 26 uses CPU cores [26] [2024-03-29 12:57:15,816][01431] Worker 14 uses CPU cores [14] [2024-03-29 12:57:15,816][01142] Worker 19 uses CPU cores [19] [2024-03-29 12:57:15,817][00948] Worker 0 uses CPU cores [0] [2024-03-29 12:57:15,825][00949] Worker 21 uses CPU cores [21] [2024-03-29 12:57:15,833][01720] Worker 33 uses CPU cores [33] [2024-03-29 12:57:15,839][01844] Worker 24 uses CPU cores [24] [2024-03-29 12:57:15,841][01182] Worker 29 uses CPU cores [29] [2024-03-29 12:57:15,863][00497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:15,863][00497] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-03-29 12:57:15,877][02105] Worker 28 uses CPU cores [28] [2024-03-29 12:57:15,893][01328] Worker 8 uses CPU cores [8] [2024-03-29 12:57:15,893][00497] Num visible devices: 1 [2024-03-29 12:57:15,897][01207] Worker 27 uses CPU cores [27] [2024-03-29 12:57:15,913][01915] Worker 30 uses CPU cores [30] [2024-03-29 12:57:15,914][01902] Worker 35 uses CPU cores [35] [2024-03-29 12:57:15,929][02314] Worker 41 uses CPU cores [41] [2024-03-29 12:57:15,958][02169] Worker 37 uses CPU cores [37] [2024-03-29 12:57:15,973][03133] Worker 53 uses CPU cores [53] [2024-03-29 12:57:15,977][01465] Worker 31 uses CPU cores [31] [2024-03-29 12:57:15,992][01077] Worker 4 uses CPU cores [4] [2024-03-29 12:57:15,993][01503] Worker 18 uses CPU cores [18] [2024-03-29 12:57:15,993][03197] Worker 32 uses CPU cores [32] [2024-03-29 12:57:16,001][01721] Worker 22 uses CPU cores [22] [2024-03-29 12:57:16,001][03064] Worker 63 uses CPU cores [63] [2024-03-29 12:57:16,009][02678] Worker 61 uses CPU cores [61] [2024-03-29 12:57:16,017][01336] Worker 10 uses CPU cores [10] [2024-03-29 12:57:16,020][03066] Worker 47 uses CPU cores [47] [2024-03-29 12:57:16,036][03643] Worker 60 uses CPU cores [60] [2024-03-29 12:57:16,037][03068] Worker 51 uses CPU cores [51] [2024-03-29 12:57:16,037][02679] Worker 40 uses CPU cores [40] [2024-03-29 12:57:16,037][02614] Worker 59 uses CPU cores [59] [2024-03-29 12:57:16,037][02747] Worker 62 uses CPU cores [62] [2024-03-29 12:57:16,059][03132] Worker 55 uses CPU cores [55] [2024-03-29 12:57:16,059][03898] Worker 58 uses CPU cores [58] [2024-03-29 12:57:16,059][02724] Worker 36 uses CPU cores [36] [2024-03-29 12:57:16,075][03065] Worker 34 uses CPU cores [34] [2024-03-29 12:57:16,089][03770] Worker 52 uses CPU cores [52] [2024-03-29 12:57:16,100][03962] Worker 48 uses CPU cores [48] [2024-03-29 12:57:16,101][03388] Worker 57 uses CPU cores [57] [2024-03-29 12:57:16,101][03063] Worker 49 uses CPU cores [49] [2024-03-29 12:57:16,126][02288] Worker 39 uses CPU cores [39] [2024-03-29 12:57:16,129][02680] Worker 44 uses CPU cores [44] [2024-03-29 12:57:16,205][02681] Worker 38 uses CPU cores [38] [2024-03-29 12:57:16,242][03452] Worker 46 uses CPU cores [46] [2024-03-29 12:57:16,243][03324] Worker 50 uses CPU cores [50] [2024-03-29 12:57:16,245][02550] Worker 43 uses CPU cores [43] [2024-03-29 12:57:16,289][03067] Worker 45 uses CPU cores [45] [2024-03-29 12:57:16,301][01271] Worker 6 uses CPU cores [6] [2024-03-29 12:57:16,309][02682] Worker 42 uses CPU cores [42] [2024-03-29 12:57:16,344][03769] Worker 56 uses CPU cores [56] [2024-03-29 12:57:16,353][03897] Worker 54 uses CPU cores [54] [2024-03-29 12:57:16,419][00476] Created Actor Critic model with architecture: [2024-03-29 12:57:16,420][00476] PredictingActorCritic( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (global_vars): RunningMeanStdInPlace() (griddly_obs): RunningMeanStdInPlace() (kinship): RunningMeanStdInPlace() (last_action): RunningMeanStdInPlace() (last_reward): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): ObjectEmeddingAgentEncoder( (object_embedding): Sequential( (0): Linear(in_features=52, out_features=64, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) ) (encoder_head): Sequential( (0): Linear(in_features=7767, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): ObjectEmeddingAgentDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=17, bias=True) ) ) [2024-03-29 12:57:17,232][00476] Using optimizer [2024-03-29 12:57:17,741][00476] Loading state from checkpoint /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007207_118079488.pth... [2024-03-29 12:57:18,040][00476] Loading model from checkpoint [2024-03-29 12:57:18,042][00476] Loaded experiment state at self.train_step=7207, self.env_steps=118079488 [2024-03-29 12:57:18,042][00476] Initialized policy 0 weights for model version 7207 [2024-03-29 12:57:18,044][00476] LearnerWorker_p0 finished initialization! [2024-03-29 12:57:18,044][00476] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:18,227][00497] RunningMeanStd input shape: (20,) [2024-03-29 12:57:18,227][00497] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:57:18,227][00497] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:57:18,228][00497] RunningMeanStd input shape: (2,) [2024-03-29 12:57:18,228][00497] RunningMeanStd input shape: (1,) [2024-03-29 12:57:18,228][00497] RunningMeanStd input shape: (1,) [2024-03-29 12:57:18,839][00126] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 118079488. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:18,871][00126] Inference worker 0-0 is ready! [2024-03-29 12:57:18,872][00126] All inference workers are ready! Signal rollout workers to start! [2024-03-29 12:57:19,816][01141] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,818][03770] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,821][00947] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,831][03197] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,832][01328] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,836][00756] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,841][01465] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,854][00883] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,856][02681] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,856][01431] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,856][01350] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,857][01844] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,857][03068] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,855][01207] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,863][01785] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,864][02614] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,865][03962] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,866][01721] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,866][03066] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,867][02105] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,868][02169] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,874][01077] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,876][00496] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,876][00499] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,878][02288] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,883][02724] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,886][02314] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,889][02550] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,890][01182] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,891][03897] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,900][01271] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,901][01720] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,905][03133] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,911][00949] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,912][03452] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,912][00675] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,914][01142] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,915][00500] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,917][02682] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,921][01336] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,922][01915] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,927][00498] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,927][03388] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,934][01076] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,934][03064] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,940][03065] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,942][03067] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,943][03132] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,945][00948] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,948][03769] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,952][02747] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,953][03643] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,957][00565] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,958][03324] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,965][01656] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,966][01902] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,966][02678] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,968][02680] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,973][01786] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,975][03063] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,980][03898] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,981][00564] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,994][02679] Decorrelating experience for 0 frames... [2024-03-29 12:57:20,000][01503] Decorrelating experience for 0 frames... [2024-03-29 12:57:20,750][01141] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,756][03770] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,758][01328] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,764][01844] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,771][02614] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,772][01077] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,774][01465] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,780][01350] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,789][00947] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,790][03068] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,793][03197] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,801][00883] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,803][01721] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,805][02169] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,810][00756] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,810][02105] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,811][03066] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,813][01271] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,819][00500] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,820][01785] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,823][00949] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,825][03132] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,825][03388] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,826][01207] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,826][01182] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,829][02724] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,836][02681] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,837][01142] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,839][00948] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,839][03067] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,841][03452] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,842][03962] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,846][00496] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,851][03064] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,852][00498] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,857][03897] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,857][01431] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,861][03065] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,862][00499] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,863][02314] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,870][02550] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,872][03643] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,873][00565] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,875][01786] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,878][02682] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,880][01902] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,880][03133] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,885][03769] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,887][01336] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,894][00675] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,894][02678] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,904][02288] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,906][01720] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,907][01915] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,915][01656] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,919][03898] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,919][02679] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,921][01076] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,923][02747] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,925][03063] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,926][02680] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,932][00564] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,938][03324] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,992][01503] Decorrelating experience for 256 frames... [2024-03-29 12:57:23,839][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 118079488. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:28,839][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 118079488. Throughput: 0: 33488.4. Samples: 334880. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:30,144][00126] Heartbeat connected on Batcher_0 [2024-03-29 12:57:30,145][00126] Heartbeat connected on LearnerWorker_p0 [2024-03-29 12:57:30,150][00126] Heartbeat connected on RolloutWorker_w0 [2024-03-29 12:57:30,152][00126] Heartbeat connected on RolloutWorker_w1 [2024-03-29 12:57:30,154][00126] Heartbeat connected on RolloutWorker_w2 [2024-03-29 12:57:30,159][00126] Heartbeat connected on RolloutWorker_w5 [2024-03-29 12:57:30,160][00126] Heartbeat connected on RolloutWorker_w6 [2024-03-29 12:57:30,161][00126] Heartbeat connected on RolloutWorker_w3 [2024-03-29 12:57:30,166][00126] Heartbeat connected on RolloutWorker_w4 [2024-03-29 12:57:30,166][00126] Heartbeat connected on RolloutWorker_w10 [2024-03-29 12:57:30,169][00126] Heartbeat connected on RolloutWorker_w12 [2024-03-29 12:57:30,170][00126] Heartbeat connected on RolloutWorker_w11 [2024-03-29 12:57:30,170][00126] Heartbeat connected on RolloutWorker_w9 [2024-03-29 12:57:30,170][00126] Heartbeat connected on RolloutWorker_w8 [2024-03-29 12:57:30,171][00126] Heartbeat connected on RolloutWorker_w13 [2024-03-29 12:57:30,171][00126] Heartbeat connected on RolloutWorker_w7 [2024-03-29 12:57:30,174][00126] Heartbeat connected on RolloutWorker_w15 [2024-03-29 12:57:30,177][00126] Heartbeat connected on InferenceWorker_p0-w0 [2024-03-29 12:57:30,179][00126] Heartbeat connected on RolloutWorker_w18 [2024-03-29 12:57:30,181][00126] Heartbeat connected on RolloutWorker_w19 [2024-03-29 12:57:30,182][00126] Heartbeat connected on RolloutWorker_w17 [2024-03-29 12:57:30,182][00126] Heartbeat connected on RolloutWorker_w20 [2024-03-29 12:57:30,183][00126] Heartbeat connected on RolloutWorker_w16 [2024-03-29 12:57:30,185][00126] Heartbeat connected on RolloutWorker_w22 [2024-03-29 12:57:30,186][00126] Heartbeat connected on RolloutWorker_w21 [2024-03-29 12:57:30,187][00126] Heartbeat connected on RolloutWorker_w23 [2024-03-29 12:57:30,188][00126] Heartbeat connected on RolloutWorker_w14 [2024-03-29 12:57:30,188][00126] Heartbeat connected on RolloutWorker_w24 [2024-03-29 12:57:30,190][00126] Heartbeat connected on RolloutWorker_w25 [2024-03-29 12:57:30,191][00126] Heartbeat connected on RolloutWorker_w26 [2024-03-29 12:57:30,193][00126] Heartbeat connected on RolloutWorker_w27 [2024-03-29 12:57:30,195][00126] Heartbeat connected on RolloutWorker_w28 [2024-03-29 12:57:30,200][00126] Heartbeat connected on RolloutWorker_w31 [2024-03-29 12:57:30,201][00126] Heartbeat connected on RolloutWorker_w32 [2024-03-29 12:57:30,203][00126] Heartbeat connected on RolloutWorker_w29 [2024-03-29 12:57:30,203][00126] Heartbeat connected on RolloutWorker_w33 [2024-03-29 12:57:30,204][00126] Heartbeat connected on RolloutWorker_w34 [2024-03-29 12:57:30,205][00126] Heartbeat connected on RolloutWorker_w30 [2024-03-29 12:57:30,206][00126] Heartbeat connected on RolloutWorker_w35 [2024-03-29 12:57:30,207][00126] Heartbeat connected on RolloutWorker_w36 [2024-03-29 12:57:30,209][00126] Heartbeat connected on RolloutWorker_w37 [2024-03-29 12:57:30,210][00126] Heartbeat connected on RolloutWorker_w38 [2024-03-29 12:57:30,212][00126] Heartbeat connected on RolloutWorker_w39 [2024-03-29 12:57:30,213][00126] Heartbeat connected on RolloutWorker_w40 [2024-03-29 12:57:30,215][00126] Heartbeat connected on RolloutWorker_w41 [2024-03-29 12:57:30,218][00126] Heartbeat connected on RolloutWorker_w43 [2024-03-29 12:57:30,223][00126] Heartbeat connected on RolloutWorker_w42 [2024-03-29 12:57:30,224][00126] Heartbeat connected on RolloutWorker_w46 [2024-03-29 12:57:30,226][00126] Heartbeat connected on RolloutWorker_w48 [2024-03-29 12:57:30,226][00126] Heartbeat connected on RolloutWorker_w45 [2024-03-29 12:57:30,227][00126] Heartbeat connected on RolloutWorker_w49 [2024-03-29 12:57:30,229][00126] Heartbeat connected on RolloutWorker_w50 [2024-03-29 12:57:30,230][00126] Heartbeat connected on RolloutWorker_w51 [2024-03-29 12:57:30,232][00126] Heartbeat connected on RolloutWorker_w52 [2024-03-29 12:57:30,232][00126] Heartbeat connected on RolloutWorker_w44 [2024-03-29 12:57:30,233][00126] Heartbeat connected on RolloutWorker_w53 [2024-03-29 12:57:30,234][00126] Heartbeat connected on RolloutWorker_w47 [2024-03-29 12:57:30,235][00126] Heartbeat connected on RolloutWorker_w54 [2024-03-29 12:57:30,236][00126] Heartbeat connected on RolloutWorker_w55 [2024-03-29 12:57:30,238][00126] Heartbeat connected on RolloutWorker_w56 [2024-03-29 12:57:30,243][00126] Heartbeat connected on RolloutWorker_w58 [2024-03-29 12:57:30,244][00126] Heartbeat connected on RolloutWorker_w57 [2024-03-29 12:57:30,245][00126] Heartbeat connected on RolloutWorker_w59 [2024-03-29 12:57:30,246][00126] Heartbeat connected on RolloutWorker_w61 [2024-03-29 12:57:30,248][00126] Heartbeat connected on RolloutWorker_w62 [2024-03-29 12:57:30,248][00126] Heartbeat connected on RolloutWorker_w63 [2024-03-29 12:57:30,249][00126] Heartbeat connected on RolloutWorker_w60 [2024-03-29 12:57:32,883][03388] Worker 57, sleep for 133.594 sec to decorrelate experience collection [2024-03-29 12:57:32,883][00496] Worker 1, sleep for 2.344 sec to decorrelate experience collection [2024-03-29 12:57:32,897][03066] Worker 47, sleep for 110.156 sec to decorrelate experience collection [2024-03-29 12:57:32,897][01465] Worker 31, sleep for 72.656 sec to decorrelate experience collection [2024-03-29 12:57:32,898][03769] Worker 56, sleep for 131.250 sec to decorrelate experience collection [2024-03-29 12:57:32,898][00565] Worker 13, sleep for 30.469 sec to decorrelate experience collection [2024-03-29 12:57:32,898][01786] Worker 26, sleep for 60.938 sec to decorrelate experience collection [2024-03-29 12:57:32,903][02614] Worker 59, sleep for 138.281 sec to decorrelate experience collection [2024-03-29 12:57:32,917][01844] Worker 24, sleep for 56.250 sec to decorrelate experience collection [2024-03-29 12:57:32,917][03452] Worker 46, sleep for 107.812 sec to decorrelate experience collection [2024-03-29 12:57:32,920][01271] Worker 6, sleep for 14.062 sec to decorrelate experience collection [2024-03-29 12:57:32,920][00756] Worker 2, sleep for 4.688 sec to decorrelate experience collection [2024-03-29 12:57:32,921][03197] Worker 32, sleep for 75.000 sec to decorrelate experience collection [2024-03-29 12:57:32,921][02169] Worker 37, sleep for 86.719 sec to decorrelate experience collection [2024-03-29 12:57:32,921][03132] Worker 55, sleep for 128.906 sec to decorrelate experience collection [2024-03-29 12:57:32,923][03770] Worker 52, sleep for 121.875 sec to decorrelate experience collection [2024-03-29 12:57:32,926][02550] Worker 43, sleep for 100.781 sec to decorrelate experience collection [2024-03-29 12:57:32,938][03065] Worker 34, sleep for 79.688 sec to decorrelate experience collection [2024-03-29 12:57:32,938][00499] Worker 5, sleep for 11.719 sec to decorrelate experience collection [2024-03-29 12:57:32,938][01431] Worker 14, sleep for 32.812 sec to decorrelate experience collection [2024-03-29 12:57:32,939][00498] Worker 3, sleep for 7.031 sec to decorrelate experience collection [2024-03-29 12:57:32,940][01721] Worker 22, sleep for 51.562 sec to decorrelate experience collection [2024-03-29 12:57:32,940][00947] Worker 17, sleep for 39.844 sec to decorrelate experience collection [2024-03-29 12:57:32,941][01350] Worker 12, sleep for 28.125 sec to decorrelate experience collection [2024-03-29 12:57:32,953][02681] Worker 38, sleep for 89.062 sec to decorrelate experience collection [2024-03-29 12:57:32,958][02724] Worker 36, sleep for 84.375 sec to decorrelate experience collection [2024-03-29 12:57:32,959][01328] Worker 8, sleep for 18.750 sec to decorrelate experience collection [2024-03-29 12:57:32,966][00883] Worker 15, sleep for 35.156 sec to decorrelate experience collection [2024-03-29 12:57:32,968][03063] Worker 49, sleep for 114.844 sec to decorrelate experience collection [2024-03-29 12:57:32,969][03898] Worker 58, sleep for 135.938 sec to decorrelate experience collection [2024-03-29 12:57:32,983][01915] Worker 30, sleep for 70.312 sec to decorrelate experience collection [2024-03-29 12:57:32,983][00500] Worker 7, sleep for 16.406 sec to decorrelate experience collection [2024-03-29 12:57:32,994][03133] Worker 53, sleep for 124.219 sec to decorrelate experience collection [2024-03-29 12:57:33,003][02747] Worker 62, sleep for 145.312 sec to decorrelate experience collection [2024-03-29 12:57:33,007][03643] Worker 60, sleep for 140.625 sec to decorrelate experience collection [2024-03-29 12:57:33,013][02314] Worker 41, sleep for 96.094 sec to decorrelate experience collection [2024-03-29 12:57:33,014][03962] Worker 48, sleep for 112.500 sec to decorrelate experience collection [2024-03-29 12:57:33,022][00675] Worker 11, sleep for 25.781 sec to decorrelate experience collection [2024-03-29 12:57:33,026][02680] Worker 44, sleep for 103.125 sec to decorrelate experience collection [2024-03-29 12:57:33,032][00564] Worker 9, sleep for 21.094 sec to decorrelate experience collection [2024-03-29 12:57:33,033][02682] Worker 42, sleep for 98.438 sec to decorrelate experience collection [2024-03-29 12:57:33,036][01182] Worker 29, sleep for 67.969 sec to decorrelate experience collection [2024-03-29 12:57:33,038][01656] Worker 16, sleep for 37.500 sec to decorrelate experience collection [2024-03-29 12:57:33,047][01720] Worker 33, sleep for 77.344 sec to decorrelate experience collection [2024-03-29 12:57:33,065][01336] Worker 10, sleep for 23.438 sec to decorrelate experience collection [2024-03-29 12:57:33,070][03897] Worker 54, sleep for 126.562 sec to decorrelate experience collection [2024-03-29 12:57:33,074][00476] Signal inference workers to stop experience collection... [2024-03-29 12:57:33,079][01785] Worker 20, sleep for 46.875 sec to decorrelate experience collection [2024-03-29 12:57:33,079][01077] Worker 4, sleep for 9.375 sec to decorrelate experience collection [2024-03-29 12:57:33,093][01503] Worker 18, sleep for 42.188 sec to decorrelate experience collection [2024-03-29 12:57:33,098][00497] InferenceWorker_p0-w0: stopping experience collection [2024-03-29 12:57:33,103][01902] Worker 35, sleep for 82.031 sec to decorrelate experience collection [2024-03-29 12:57:33,103][02679] Worker 40, sleep for 93.750 sec to decorrelate experience collection [2024-03-29 12:57:33,104][03064] Worker 63, sleep for 147.656 sec to decorrelate experience collection [2024-03-29 12:57:33,110][01207] Worker 27, sleep for 63.281 sec to decorrelate experience collection [2024-03-29 12:57:33,110][01141] Worker 23, sleep for 53.906 sec to decorrelate experience collection [2024-03-29 12:57:33,122][03068] Worker 51, sleep for 119.531 sec to decorrelate experience collection [2024-03-29 12:57:33,122][03324] Worker 50, sleep for 117.188 sec to decorrelate experience collection [2024-03-29 12:57:33,839][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 118079488. Throughput: 0: 34871.1. Samples: 523060. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:34,826][00476] Signal inference workers to resume experience collection... [2024-03-29 12:57:34,826][00497] InferenceWorker_p0-w0: resuming experience collection [2024-03-29 12:57:34,871][01076] Worker 25, sleep for 58.594 sec to decorrelate experience collection [2024-03-29 12:57:34,903][02678] Worker 61, sleep for 142.969 sec to decorrelate experience collection [2024-03-29 12:57:34,929][02105] Worker 28, sleep for 65.625 sec to decorrelate experience collection [2024-03-29 12:57:34,930][03067] Worker 45, sleep for 105.469 sec to decorrelate experience collection [2024-03-29 12:57:34,935][02288] Worker 39, sleep for 91.406 sec to decorrelate experience collection [2024-03-29 12:57:35,239][00496] Worker 1 awakens! [2024-03-29 12:57:35,376][00949] Worker 21, sleep for 49.219 sec to decorrelate experience collection [2024-03-29 12:57:35,410][01142] Worker 19, sleep for 44.531 sec to decorrelate experience collection [2024-03-29 12:57:37,190][00497] Updated weights for policy 0, policy_version 7217 (0.0014) [2024-03-29 12:57:37,631][00756] Worker 2 awakens! [2024-03-29 12:57:38,839][00126] Fps is (10 sec: 27853.0, 60 sec: 13926.5, 300 sec: 13926.5). Total num frames: 118358016. Throughput: 0: 32859.3. Samples: 657180. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-03-29 12:57:39,456][00497] Updated weights for policy 0, policy_version 7227 (0.0013) [2024-03-29 12:57:40,006][00498] Worker 3 awakens! [2024-03-29 12:57:42,502][01077] Worker 4 awakens! [2024-03-29 12:57:43,839][00126] Fps is (10 sec: 34406.3, 60 sec: 13762.7, 300 sec: 13762.7). Total num frames: 118423552. Throughput: 0: 27129.8. Samples: 678240. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-03-29 12:57:44,681][00499] Worker 5 awakens! [2024-03-29 12:57:47,053][01271] Worker 6 awakens! [2024-03-29 12:57:48,839][00126] Fps is (10 sec: 11468.7, 60 sec: 13107.2, 300 sec: 13107.2). Total num frames: 118472704. Throughput: 0: 23285.4. Samples: 698560. Policy #0 lag: (min: 0.0, avg: 18.7, max: 21.0) [2024-03-29 12:57:49,396][00500] Worker 7 awakens! [2024-03-29 12:57:51,733][01328] Worker 8 awakens! [2024-03-29 12:57:53,839][00126] Fps is (10 sec: 8192.1, 60 sec: 12171.1, 300 sec: 12171.1). Total num frames: 118505472. Throughput: 0: 21428.7. Samples: 750000. Policy #0 lag: (min: 0.0, avg: 9.3, max: 25.0) [2024-03-29 12:57:54,226][00564] Worker 9 awakens! [2024-03-29 12:57:56,603][01336] Worker 10 awakens! [2024-03-29 12:57:57,495][00497] Updated weights for policy 0, policy_version 7237 (0.0012) [2024-03-29 12:57:58,821][00675] Worker 11 awakens! [2024-03-29 12:57:58,839][00126] Fps is (10 sec: 9830.6, 60 sec: 12288.1, 300 sec: 12288.1). Total num frames: 118571008. Throughput: 0: 20585.1. Samples: 823400. Policy #0 lag: (min: 0.0, avg: 9.3, max: 25.0) [2024-03-29 12:57:58,840][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:58:01,167][01350] Worker 12 awakens! [2024-03-29 12:58:03,468][00565] Worker 13 awakens! [2024-03-29 12:58:03,839][00126] Fps is (10 sec: 19660.8, 60 sec: 13835.5, 300 sec: 13835.5). Total num frames: 118702080. Throughput: 0: 19660.6. Samples: 884720. Policy #0 lag: (min: 0.0, avg: 3.2, max: 7.0) [2024-03-29 12:58:03,839][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 12:58:05,851][01431] Worker 14 awakens! [2024-03-29 12:58:06,143][00497] Updated weights for policy 0, policy_version 7247 (0.0013) [2024-03-29 12:58:08,223][00883] Worker 15 awakens! [2024-03-29 12:58:08,839][00126] Fps is (10 sec: 26214.2, 60 sec: 15073.3, 300 sec: 15073.3). Total num frames: 118833152. Throughput: 0: 22755.6. Samples: 1024000. Policy #0 lag: (min: 0.0, avg: 4.5, max: 9.0) [2024-03-29 12:58:08,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 12:58:10,638][01656] Worker 16 awakens! [2024-03-29 12:58:12,137][00497] Updated weights for policy 0, policy_version 7257 (0.0012) [2024-03-29 12:58:12,871][00947] Worker 17 awakens! [2024-03-29 12:58:13,839][00126] Fps is (10 sec: 26214.2, 60 sec: 16086.2, 300 sec: 16086.2). Total num frames: 118964224. Throughput: 0: 19095.6. Samples: 1194180. Policy #0 lag: (min: 0.0, avg: 4.5, max: 9.0) [2024-03-29 12:58:13,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 12:58:15,294][01503] Worker 18 awakens! [2024-03-29 12:58:17,973][00497] Updated weights for policy 0, policy_version 7267 (0.0011) [2024-03-29 12:58:18,839][00126] Fps is (10 sec: 26214.2, 60 sec: 16930.2, 300 sec: 16930.2). Total num frames: 119095296. Throughput: 0: 16682.2. Samples: 1273760. Policy #0 lag: (min: 1.0, avg: 7.3, max: 15.0) [2024-03-29 12:58:18,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 12:58:20,005][01785] Worker 20 awakens! [2024-03-29 12:58:20,042][01142] Worker 19 awakens! [2024-03-29 12:58:22,568][00497] Updated weights for policy 0, policy_version 7277 (0.0014) [2024-03-29 12:58:23,839][00126] Fps is (10 sec: 27852.6, 60 sec: 19387.7, 300 sec: 17896.4). Total num frames: 119242752. Throughput: 0: 17724.0. Samples: 1454760. Policy #0 lag: (min: 0.0, avg: 7.0, max: 15.0) [2024-03-29 12:58:23,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 12:58:24,577][01721] Worker 22 awakens! [2024-03-29 12:58:24,695][00949] Worker 21 awakens! [2024-03-29 12:58:27,023][00497] Updated weights for policy 0, policy_version 7287 (0.0013) [2024-03-29 12:58:27,046][01141] Worker 23 awakens! [2024-03-29 12:58:28,839][00126] Fps is (10 sec: 36044.7, 60 sec: 22937.6, 300 sec: 19660.8). Total num frames: 119455744. Throughput: 0: 22146.6. Samples: 1674840. Policy #0 lag: (min: 1.0, avg: 31.0, max: 81.0) [2024-03-29 12:58:28,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 12:58:29,267][01844] Worker 24 awakens! [2024-03-29 12:58:30,855][00497] Updated weights for policy 0, policy_version 7297 (0.0013) [2024-03-29 12:58:33,565][01076] Worker 25 awakens! [2024-03-29 12:58:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 26760.5, 300 sec: 21408.4). Total num frames: 119685120. Throughput: 0: 24564.4. Samples: 1803960. Policy #0 lag: (min: 1.0, avg: 31.0, max: 81.0) [2024-03-29 12:58:33,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 12:58:33,932][01786] Worker 26 awakens! [2024-03-29 12:58:35,040][00497] Updated weights for policy 0, policy_version 7307 (0.0015) [2024-03-29 12:58:36,493][01207] Worker 27 awakens! [2024-03-29 12:58:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 24849.0, 300 sec: 22118.4). Total num frames: 119848960. Throughput: 0: 29056.8. Samples: 2057560. Policy #0 lag: (min: 0.0, avg: 37.9, max: 98.0) [2024-03-29 12:58:38,840][00126] Avg episode reward: [(0, '0.266')] [2024-03-29 12:58:39,345][00497] Updated weights for policy 0, policy_version 7317 (0.0013) [2024-03-29 12:58:39,530][00476] Signal inference workers to stop experience collection... (50 times) [2024-03-29 12:58:39,544][00497] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-03-29 12:58:39,745][00476] Signal inference workers to resume experience collection... (50 times) [2024-03-29 12:58:39,745][00497] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-03-29 12:58:40,654][02105] Worker 28 awakens! [2024-03-29 12:58:41,105][01182] Worker 29 awakens! [2024-03-29 12:58:43,279][00497] Updated weights for policy 0, policy_version 7327 (0.0017) [2024-03-29 12:58:43,396][01915] Worker 30 awakens! [2024-03-29 12:58:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 27306.6, 300 sec: 23323.1). Total num frames: 120061952. Throughput: 0: 32639.4. Samples: 2292180. Policy #0 lag: (min: 2.0, avg: 11.2, max: 20.0) [2024-03-29 12:58:43,840][00126] Avg episode reward: [(0, '0.279')] [2024-03-29 12:58:45,654][01465] Worker 31 awakens! [2024-03-29 12:58:47,844][00497] Updated weights for policy 0, policy_version 7337 (0.0021) [2024-03-29 12:58:48,021][03197] Worker 32 awakens! [2024-03-29 12:58:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 29764.3, 300 sec: 24211.9). Total num frames: 120258560. Throughput: 0: 33926.1. Samples: 2411400. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-03-29 12:58:48,841][00126] Avg episode reward: [(0, '0.273')] [2024-03-29 12:58:50,491][01720] Worker 33 awakens! [2024-03-29 12:58:51,757][00497] Updated weights for policy 0, policy_version 7347 (0.0021) [2024-03-29 12:58:52,726][03065] Worker 34 awakens! [2024-03-29 12:58:53,839][00126] Fps is (10 sec: 39322.0, 60 sec: 32494.9, 300 sec: 25007.2). Total num frames: 120455168. Throughput: 0: 35969.8. Samples: 2642640. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-03-29 12:58:53,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 12:58:55,234][01902] Worker 35 awakens! [2024-03-29 12:58:56,278][00497] Updated weights for policy 0, policy_version 7357 (0.0021) [2024-03-29 12:58:57,433][02724] Worker 36 awakens! [2024-03-29 12:58:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 35225.5, 300 sec: 26050.6). Total num frames: 120684544. Throughput: 0: 37241.3. Samples: 2870040. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-03-29 12:58:58,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:58:59,741][02169] Worker 37 awakens! [2024-03-29 12:58:59,945][00497] Updated weights for policy 0, policy_version 7367 (0.0020) [2024-03-29 12:59:02,021][02681] Worker 38 awakens! [2024-03-29 12:59:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 35771.7, 300 sec: 26370.5). Total num frames: 120848384. Throughput: 0: 38338.3. Samples: 2998980. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-03-29 12:59:03,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 12:59:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007377_120864768.pth... [2024-03-29 12:59:03,919][00497] Updated weights for policy 0, policy_version 7377 (0.0023) [2024-03-29 12:59:04,228][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006906_113147904.pth [2024-03-29 12:59:06,430][02288] Worker 39 awakens! [2024-03-29 12:59:06,957][02679] Worker 40 awakens! [2024-03-29 12:59:07,846][00497] Updated weights for policy 0, policy_version 7387 (0.0026) [2024-03-29 12:59:08,839][00126] Fps is (10 sec: 37683.1, 60 sec: 37137.0, 300 sec: 27108.1). Total num frames: 121061376. Throughput: 0: 39645.8. Samples: 3238820. Policy #0 lag: (min: 2.0, avg: 13.5, max: 26.0) [2024-03-29 12:59:08,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 12:59:09,207][02314] Worker 41 awakens! [2024-03-29 12:59:11,571][02682] Worker 42 awakens! [2024-03-29 12:59:12,448][00497] Updated weights for policy 0, policy_version 7397 (0.0021) [2024-03-29 12:59:13,808][02550] Worker 43 awakens! [2024-03-29 12:59:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 38229.2, 300 sec: 27639.1). Total num frames: 121257984. Throughput: 0: 39994.2. Samples: 3474580. Policy #0 lag: (min: 2.0, avg: 13.5, max: 26.0) [2024-03-29 12:59:13,840][00126] Avg episode reward: [(0, '0.254')] [2024-03-29 12:59:16,251][02680] Worker 44 awakens! [2024-03-29 12:59:17,161][00497] Updated weights for policy 0, policy_version 7407 (0.0019) [2024-03-29 12:59:17,899][00476] Signal inference workers to stop experience collection... (100 times) [2024-03-29 12:59:17,919][00497] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-03-29 12:59:18,109][00476] Signal inference workers to resume experience collection... (100 times) [2024-03-29 12:59:18,109][00497] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-03-29 12:59:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 39321.6, 300 sec: 28125.9). Total num frames: 121454592. Throughput: 0: 40171.6. Samples: 3611680. Policy #0 lag: (min: 1.0, avg: 14.3, max: 26.0) [2024-03-29 12:59:18,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 12:59:19,934][00497] Updated weights for policy 0, policy_version 7417 (0.0018) [2024-03-29 12:59:20,499][03067] Worker 45 awakens! [2024-03-29 12:59:20,830][03452] Worker 46 awakens! [2024-03-29 12:59:20,933][00476] self.policy_id=0 batch has 56.25% of invalid samples [2024-03-29 12:59:23,154][03066] Worker 47 awakens! [2024-03-29 12:59:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40140.7, 300 sec: 28573.7). Total num frames: 121651200. Throughput: 0: 39690.2. Samples: 3843620. Policy #0 lag: (min: 0.0, avg: 125.0, max: 211.0) [2024-03-29 12:59:23,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 12:59:24,259][00497] Updated weights for policy 0, policy_version 7427 (0.0020) [2024-03-29 12:59:25,615][03962] Worker 48 awakens! [2024-03-29 12:59:27,813][03063] Worker 49 awakens! [2024-03-29 12:59:28,839][00126] Fps is (10 sec: 37683.2, 60 sec: 39594.7, 300 sec: 28861.1). Total num frames: 121831424. Throughput: 0: 40418.8. Samples: 4111020. Policy #0 lag: (min: 1.0, avg: 52.8, max: 227.0) [2024-03-29 12:59:28,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 12:59:28,847][00497] Updated weights for policy 0, policy_version 7437 (0.0019) [2024-03-29 12:59:30,373][03324] Worker 50 awakens! [2024-03-29 12:59:31,643][00497] Updated weights for policy 0, policy_version 7447 (0.0018) [2024-03-29 12:59:32,741][03068] Worker 51 awakens! [2024-03-29 12:59:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 39867.8, 300 sec: 29612.6). Total num frames: 122077184. Throughput: 0: 39756.5. Samples: 4200440. Policy #0 lag: (min: 1.0, avg: 52.8, max: 227.0) [2024-03-29 12:59:33,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 12:59:34,898][03770] Worker 52 awakens! [2024-03-29 12:59:35,755][00497] Updated weights for policy 0, policy_version 7457 (0.0021) [2024-03-29 12:59:37,313][03133] Worker 53 awakens! [2024-03-29 12:59:38,839][00126] Fps is (10 sec: 47513.1, 60 sec: 40960.0, 300 sec: 30193.4). Total num frames: 122306560. Throughput: 0: 40563.9. Samples: 4468020. Policy #0 lag: (min: 0.0, avg: 17.3, max: 34.0) [2024-03-29 12:59:38,840][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 12:59:39,713][03897] Worker 54 awakens! [2024-03-29 12:59:40,631][00497] Updated weights for policy 0, policy_version 7467 (0.0020) [2024-03-29 12:59:41,928][03132] Worker 55 awakens! [2024-03-29 12:59:43,441][00497] Updated weights for policy 0, policy_version 7477 (0.0018) [2024-03-29 12:59:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 40960.1, 300 sec: 30621.2). Total num frames: 122519552. Throughput: 0: 41176.9. Samples: 4723000. Policy #0 lag: (min: 0.0, avg: 14.8, max: 35.0) [2024-03-29 12:59:43,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 12:59:44,249][03769] Worker 56 awakens! [2024-03-29 12:59:46,577][03388] Worker 57 awakens! [2024-03-29 12:59:47,417][00497] Updated weights for policy 0, policy_version 7487 (0.0022) [2024-03-29 12:59:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40960.0, 300 sec: 30911.2). Total num frames: 122716160. Throughput: 0: 41136.4. Samples: 4850120. Policy #0 lag: (min: 0.0, avg: 18.7, max: 36.0) [2024-03-29 12:59:48,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 12:59:48,949][03898] Worker 58 awakens! [2024-03-29 12:59:51,116][00497] Updated weights for policy 0, policy_version 7497 (0.0019) [2024-03-29 12:59:51,289][02614] Worker 59 awakens! [2024-03-29 12:59:51,736][00476] Signal inference workers to stop experience collection... (150 times) [2024-03-29 12:59:51,779][00497] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-03-29 12:59:51,816][00476] Signal inference workers to resume experience collection... (150 times) [2024-03-29 12:59:51,819][00497] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-03-29 12:59:53,683][03643] Worker 60 awakens! [2024-03-29 12:59:53,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40686.9, 300 sec: 31076.8). Total num frames: 122896384. Throughput: 0: 41400.0. Samples: 5101820. Policy #0 lag: (min: 0.0, avg: 18.7, max: 36.0) [2024-03-29 12:59:53,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 12:59:55,868][00497] Updated weights for policy 0, policy_version 7507 (0.0020) [2024-03-29 12:59:57,902][02678] Worker 61 awakens! [2024-03-29 12:59:58,320][02747] Worker 62 awakens! [2024-03-29 12:59:58,508][00497] Updated weights for policy 0, policy_version 7517 (0.0019) [2024-03-29 12:59:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 31744.0). Total num frames: 123158528. Throughput: 0: 41614.8. Samples: 5347240. Policy #0 lag: (min: 0.0, avg: 15.8, max: 37.0) [2024-03-29 12:59:58,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:00:00,861][03064] Worker 63 awakens! [2024-03-29 13:00:02,786][00497] Updated weights for policy 0, policy_version 7527 (0.0020) [2024-03-29 13:00:03,839][00126] Fps is (10 sec: 45875.3, 60 sec: 41779.2, 300 sec: 31973.7). Total num frames: 123355136. Throughput: 0: 41873.8. Samples: 5496000. Policy #0 lag: (min: 2.0, avg: 21.1, max: 41.0) [2024-03-29 13:00:03,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 13:00:06,290][00497] Updated weights for policy 0, policy_version 7537 (0.0019) [2024-03-29 13:00:08,839][00126] Fps is (10 sec: 36044.4, 60 sec: 40959.9, 300 sec: 31997.0). Total num frames: 123518976. Throughput: 0: 42320.4. Samples: 5748040. Policy #0 lag: (min: 2.0, avg: 21.1, max: 41.0) [2024-03-29 13:00:08,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:00:11,348][00497] Updated weights for policy 0, policy_version 7547 (0.0025) [2024-03-29 13:00:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 32674.4). Total num frames: 123797504. Throughput: 0: 41841.8. Samples: 5993900. Policy #0 lag: (min: 0.0, avg: 19.0, max: 39.0) [2024-03-29 13:00:13,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:00:13,990][00497] Updated weights for policy 0, policy_version 7557 (0.0028) [2024-03-29 13:00:18,340][00497] Updated weights for policy 0, policy_version 7567 (0.0020) [2024-03-29 13:00:18,839][00126] Fps is (10 sec: 47513.8, 60 sec: 42325.3, 300 sec: 32859.0). Total num frames: 123994112. Throughput: 0: 42745.7. Samples: 6124000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 40.0) [2024-03-29 13:00:18,840][00126] Avg episode reward: [(0, '0.250')] [2024-03-29 13:00:22,084][00497] Updated weights for policy 0, policy_version 7577 (0.0027) [2024-03-29 13:00:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.3, 300 sec: 32945.1). Total num frames: 124174336. Throughput: 0: 42193.8. Samples: 6366740. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 13:00:23,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:00:26,039][00476] Signal inference workers to stop experience collection... (200 times) [2024-03-29 13:00:26,076][00497] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-03-29 13:00:26,253][00476] Signal inference workers to resume experience collection... (200 times) [2024-03-29 13:00:26,254][00497] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-03-29 13:00:27,140][00497] Updated weights for policy 0, policy_version 7587 (0.0035) [2024-03-29 13:00:28,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42598.5, 300 sec: 33199.2). Total num frames: 124387328. Throughput: 0: 42442.3. Samples: 6632900. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 13:00:28,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:00:29,912][00497] Updated weights for policy 0, policy_version 7597 (0.0023) [2024-03-29 13:00:33,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 33440.2). Total num frames: 124600320. Throughput: 0: 41895.2. Samples: 6735400. Policy #0 lag: (min: 2.0, avg: 24.2, max: 44.0) [2024-03-29 13:00:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:00:34,385][00497] Updated weights for policy 0, policy_version 7607 (0.0026) [2024-03-29 13:00:37,931][00497] Updated weights for policy 0, policy_version 7617 (0.0020) [2024-03-29 13:00:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.3, 300 sec: 33669.1). Total num frames: 124813312. Throughput: 0: 41911.1. Samples: 6987820. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 13:00:38,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:00:43,094][00497] Updated weights for policy 0, policy_version 7627 (0.0028) [2024-03-29 13:00:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.0, 300 sec: 33727.1). Total num frames: 124993536. Throughput: 0: 42470.2. Samples: 7258400. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 13:00:43,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:00:45,887][00497] Updated weights for policy 0, policy_version 7637 (0.0021) [2024-03-29 13:00:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 34016.3). Total num frames: 125222912. Throughput: 0: 41119.1. Samples: 7346360. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 13:00:48,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:00:50,144][00497] Updated weights for policy 0, policy_version 7647 (0.0022) [2024-03-29 13:00:53,500][00497] Updated weights for policy 0, policy_version 7657 (0.0022) [2024-03-29 13:00:53,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 34292.1). Total num frames: 125452288. Throughput: 0: 41503.2. Samples: 7615680. Policy #0 lag: (min: 1.0, avg: 22.3, max: 42.0) [2024-03-29 13:00:53,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:00:58,015][00476] Signal inference workers to stop experience collection... (250 times) [2024-03-29 13:00:58,016][00476] Signal inference workers to resume experience collection... (250 times) [2024-03-29 13:00:58,057][00497] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-03-29 13:00:58,057][00497] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-03-29 13:00:58,550][00497] Updated weights for policy 0, policy_version 7667 (0.0024) [2024-03-29 13:00:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 34332.0). Total num frames: 125632512. Throughput: 0: 42397.8. Samples: 7901800. Policy #0 lag: (min: 0.0, avg: 17.0, max: 41.0) [2024-03-29 13:00:58,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 13:01:01,195][00497] Updated weights for policy 0, policy_version 7677 (0.0027) [2024-03-29 13:01:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 34588.5). Total num frames: 125861888. Throughput: 0: 41239.6. Samples: 7979780. Policy #0 lag: (min: 0.0, avg: 17.0, max: 41.0) [2024-03-29 13:01:03,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:01:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007682_125861888.pth... [2024-03-29 13:01:04,169][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007207_118079488.pth [2024-03-29 13:01:05,638][00497] Updated weights for policy 0, policy_version 7687 (0.0031) [2024-03-29 13:01:08,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42598.4, 300 sec: 34762.6). Total num frames: 126074880. Throughput: 0: 41927.6. Samples: 8253480. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 13:01:08,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 13:01:09,247][00497] Updated weights for policy 0, policy_version 7697 (0.0022) [2024-03-29 13:01:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40686.9, 300 sec: 34720.2). Total num frames: 126238720. Throughput: 0: 42014.6. Samples: 8523560. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:01:13,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:01:14,327][00497] Updated weights for policy 0, policy_version 7707 (0.0018) [2024-03-29 13:01:16,925][00497] Updated weights for policy 0, policy_version 7717 (0.0024) [2024-03-29 13:01:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 35020.8). Total num frames: 126484480. Throughput: 0: 42009.7. Samples: 8625840. Policy #0 lag: (min: 1.0, avg: 22.8, max: 41.0) [2024-03-29 13:01:18,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:01:21,185][00497] Updated weights for policy 0, policy_version 7727 (0.0024) [2024-03-29 13:01:23,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42052.2, 300 sec: 35175.4). Total num frames: 126697472. Throughput: 0: 41876.3. Samples: 8872260. Policy #0 lag: (min: 1.0, avg: 22.8, max: 41.0) [2024-03-29 13:01:23,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:01:24,895][00497] Updated weights for policy 0, policy_version 7737 (0.0029) [2024-03-29 13:01:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.0, 300 sec: 35127.3). Total num frames: 126861312. Throughput: 0: 41963.1. Samples: 9146740. Policy #0 lag: (min: 0.0, avg: 22.8, max: 41.0) [2024-03-29 13:01:28,842][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:01:29,791][00476] Signal inference workers to stop experience collection... (300 times) [2024-03-29 13:01:29,872][00497] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-03-29 13:01:29,955][00476] Signal inference workers to resume experience collection... (300 times) [2024-03-29 13:01:29,955][00497] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-03-29 13:01:29,959][00497] Updated weights for policy 0, policy_version 7747 (0.0021) [2024-03-29 13:01:32,713][00497] Updated weights for policy 0, policy_version 7757 (0.0034) [2024-03-29 13:01:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 35530.8). Total num frames: 127139840. Throughput: 0: 42675.9. Samples: 9266780. Policy #0 lag: (min: 2.0, avg: 22.4, max: 44.0) [2024-03-29 13:01:33,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:01:36,667][00497] Updated weights for policy 0, policy_version 7767 (0.0024) [2024-03-29 13:01:38,839][00126] Fps is (10 sec: 45875.5, 60 sec: 41779.2, 300 sec: 35540.7). Total num frames: 127320064. Throughput: 0: 41796.5. Samples: 9496520. Policy #0 lag: (min: 2.0, avg: 22.4, max: 44.0) [2024-03-29 13:01:38,840][00126] Avg episode reward: [(0, '0.280')] [2024-03-29 13:01:40,377][00497] Updated weights for policy 0, policy_version 7777 (0.0020) [2024-03-29 13:01:43,839][00126] Fps is (10 sec: 34406.7, 60 sec: 41506.1, 300 sec: 35488.4). Total num frames: 127483904. Throughput: 0: 41545.8. Samples: 9771360. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:01:43,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:01:45,700][00497] Updated weights for policy 0, policy_version 7787 (0.0023) [2024-03-29 13:01:48,420][00497] Updated weights for policy 0, policy_version 7797 (0.0024) [2024-03-29 13:01:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 35862.8). Total num frames: 127762432. Throughput: 0: 42744.4. Samples: 9903280. Policy #0 lag: (min: 2.0, avg: 18.7, max: 42.0) [2024-03-29 13:01:48,841][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:01:52,230][00497] Updated weights for policy 0, policy_version 7807 (0.0020) [2024-03-29 13:01:53,839][00126] Fps is (10 sec: 47513.6, 60 sec: 41779.2, 300 sec: 35925.7). Total num frames: 127959040. Throughput: 0: 41632.5. Samples: 10126940. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:01:53,840][00126] Avg episode reward: [(0, '0.281')] [2024-03-29 13:01:55,905][00497] Updated weights for policy 0, policy_version 7817 (0.0023) [2024-03-29 13:01:58,839][00126] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 35869.3). Total num frames: 128122880. Throughput: 0: 41820.0. Samples: 10405460. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:01:58,841][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:02:01,278][00497] Updated weights for policy 0, policy_version 7827 (0.0023) [2024-03-29 13:02:02,082][00476] Signal inference workers to stop experience collection... (350 times) [2024-03-29 13:02:02,122][00497] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-03-29 13:02:02,307][00476] Signal inference workers to resume experience collection... (350 times) [2024-03-29 13:02:02,307][00497] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-03-29 13:02:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 36159.8). Total num frames: 128385024. Throughput: 0: 42613.9. Samples: 10543460. Policy #0 lag: (min: 0.0, avg: 17.8, max: 42.0) [2024-03-29 13:02:03,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:02:04,004][00497] Updated weights for policy 0, policy_version 7837 (0.0034) [2024-03-29 13:02:07,808][00497] Updated weights for policy 0, policy_version 7847 (0.0029) [2024-03-29 13:02:08,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42052.2, 300 sec: 36270.8). Total num frames: 128598016. Throughput: 0: 42214.3. Samples: 10771900. Policy #0 lag: (min: 3.0, avg: 23.0, max: 43.0) [2024-03-29 13:02:08,840][00126] Avg episode reward: [(0, '0.268')] [2024-03-29 13:02:11,731][00497] Updated weights for policy 0, policy_version 7857 (0.0020) [2024-03-29 13:02:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 42052.3, 300 sec: 36211.4). Total num frames: 128761856. Throughput: 0: 41463.2. Samples: 11012580. Policy #0 lag: (min: 3.0, avg: 23.0, max: 43.0) [2024-03-29 13:02:13,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:02:17,056][00497] Updated weights for policy 0, policy_version 7867 (0.0026) [2024-03-29 13:02:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 36988.9). Total num frames: 128991232. Throughput: 0: 42090.2. Samples: 11160840. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 13:02:18,840][00126] Avg episode reward: [(0, '0.247')] [2024-03-29 13:02:19,821][00497] Updated weights for policy 0, policy_version 7877 (0.0019) [2024-03-29 13:02:23,621][00497] Updated weights for policy 0, policy_version 7887 (0.0016) [2024-03-29 13:02:23,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.4, 300 sec: 37766.5). Total num frames: 129220608. Throughput: 0: 42057.3. Samples: 11389100. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:02:23,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:02:27,660][00497] Updated weights for policy 0, policy_version 7897 (0.0018) [2024-03-29 13:02:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 38377.4). Total num frames: 129400832. Throughput: 0: 41166.1. Samples: 11623840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:02:28,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:02:32,871][00497] Updated weights for policy 0, policy_version 7907 (0.0031) [2024-03-29 13:02:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.1, 300 sec: 38099.7). Total num frames: 129597440. Throughput: 0: 41730.3. Samples: 11781140. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:02:33,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:02:34,455][00476] Signal inference workers to stop experience collection... (400 times) [2024-03-29 13:02:34,492][00497] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-03-29 13:02:34,669][00476] Signal inference workers to resume experience collection... (400 times) [2024-03-29 13:02:34,669][00497] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-03-29 13:02:35,733][00497] Updated weights for policy 0, policy_version 7917 (0.0019) [2024-03-29 13:02:38,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.2, 300 sec: 38710.7). Total num frames: 129843200. Throughput: 0: 41751.9. Samples: 12005780. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:02:38,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:02:39,274][00497] Updated weights for policy 0, policy_version 7927 (0.0033) [2024-03-29 13:02:43,091][00497] Updated weights for policy 0, policy_version 7937 (0.0024) [2024-03-29 13:02:43,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42871.4, 300 sec: 39266.1). Total num frames: 130056192. Throughput: 0: 41156.9. Samples: 12257520. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:02:43,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:02:48,447][00497] Updated weights for policy 0, policy_version 7947 (0.0019) [2024-03-29 13:02:48,839][00126] Fps is (10 sec: 37683.9, 60 sec: 40960.1, 300 sec: 39710.4). Total num frames: 130220032. Throughput: 0: 41565.9. Samples: 12413920. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 13:02:48,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:02:51,268][00497] Updated weights for policy 0, policy_version 7957 (0.0021) [2024-03-29 13:02:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 40321.3). Total num frames: 130465792. Throughput: 0: 41487.7. Samples: 12638840. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 13:02:53,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:02:54,707][00497] Updated weights for policy 0, policy_version 7967 (0.0026) [2024-03-29 13:02:58,586][00497] Updated weights for policy 0, policy_version 7977 (0.0030) [2024-03-29 13:02:58,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42871.5, 300 sec: 40654.5). Total num frames: 130695168. Throughput: 0: 41953.8. Samples: 12900500. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 13:02:58,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:03:03,609][00497] Updated weights for policy 0, policy_version 7987 (0.0025) [2024-03-29 13:03:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.1, 300 sec: 40765.6). Total num frames: 130859008. Throughput: 0: 41968.1. Samples: 13049400. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 13:03:03,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 13:03:04,128][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007989_130891776.pth... [2024-03-29 13:03:04,440][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007377_120864768.pth [2024-03-29 13:03:06,035][00476] Signal inference workers to stop experience collection... (450 times) [2024-03-29 13:03:06,035][00476] Signal inference workers to resume experience collection... (450 times) [2024-03-29 13:03:06,057][00497] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-03-29 13:03:06,058][00497] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-03-29 13:03:06,588][00497] Updated weights for policy 0, policy_version 7997 (0.0019) [2024-03-29 13:03:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41209.9). Total num frames: 131121152. Throughput: 0: 41961.7. Samples: 13277380. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 13:03:08,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:03:10,120][00497] Updated weights for policy 0, policy_version 8007 (0.0026) [2024-03-29 13:03:13,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42871.5, 300 sec: 41487.6). Total num frames: 131334144. Throughput: 0: 42676.6. Samples: 13544280. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:03:13,840][00126] Avg episode reward: [(0, '0.277')] [2024-03-29 13:03:13,953][00497] Updated weights for policy 0, policy_version 8017 (0.0026) [2024-03-29 13:03:18,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 131497984. Throughput: 0: 42215.2. Samples: 13680820. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:03:18,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:03:19,158][00497] Updated weights for policy 0, policy_version 8027 (0.0025) [2024-03-29 13:03:21,892][00497] Updated weights for policy 0, policy_version 8037 (0.0025) [2024-03-29 13:03:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 131743744. Throughput: 0: 42412.5. Samples: 13914340. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 13:03:23,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:03:25,540][00497] Updated weights for policy 0, policy_version 8047 (0.0031) [2024-03-29 13:03:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 41598.7). Total num frames: 131956736. Throughput: 0: 42648.1. Samples: 14176680. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:03:28,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 13:03:29,536][00497] Updated weights for policy 0, policy_version 8057 (0.0034) [2024-03-29 13:03:33,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 132136960. Throughput: 0: 42304.8. Samples: 14317640. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:03:33,841][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:03:34,651][00497] Updated weights for policy 0, policy_version 8067 (0.0019) [2024-03-29 13:03:37,484][00497] Updated weights for policy 0, policy_version 8077 (0.0028) [2024-03-29 13:03:37,648][00476] Signal inference workers to stop experience collection... (500 times) [2024-03-29 13:03:37,683][00497] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-03-29 13:03:37,828][00476] Signal inference workers to resume experience collection... (500 times) [2024-03-29 13:03:37,829][00497] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-03-29 13:03:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 132382720. Throughput: 0: 42498.6. Samples: 14551280. Policy #0 lag: (min: 0.0, avg: 17.0, max: 41.0) [2024-03-29 13:03:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:03:41,223][00497] Updated weights for policy 0, policy_version 8087 (0.0020) [2024-03-29 13:03:43,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 132595712. Throughput: 0: 42286.1. Samples: 14803380. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 13:03:43,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:03:45,044][00497] Updated weights for policy 0, policy_version 8097 (0.0017) [2024-03-29 13:03:48,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42598.2, 300 sec: 41765.3). Total num frames: 132775936. Throughput: 0: 42030.6. Samples: 14940780. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 13:03:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:03:50,145][00497] Updated weights for policy 0, policy_version 8107 (0.0026) [2024-03-29 13:03:52,980][00497] Updated weights for policy 0, policy_version 8117 (0.0029) [2024-03-29 13:03:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 133021696. Throughput: 0: 42661.3. Samples: 15197140. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 13:03:53,841][00126] Avg episode reward: [(0, '0.273')] [2024-03-29 13:03:56,476][00497] Updated weights for policy 0, policy_version 8127 (0.0018) [2024-03-29 13:03:58,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 133234688. Throughput: 0: 42264.4. Samples: 15446180. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 13:03:58,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:04:00,330][00497] Updated weights for policy 0, policy_version 8137 (0.0024) [2024-03-29 13:04:03,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 133414912. Throughput: 0: 42105.3. Samples: 15575560. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:04:03,841][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:04:05,414][00497] Updated weights for policy 0, policy_version 8147 (0.0022) [2024-03-29 13:04:08,203][00497] Updated weights for policy 0, policy_version 8157 (0.0019) [2024-03-29 13:04:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 133660672. Throughput: 0: 42858.2. Samples: 15842960. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:04:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:04:11,992][00497] Updated weights for policy 0, policy_version 8167 (0.0020) [2024-03-29 13:04:13,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 133873664. Throughput: 0: 42369.3. Samples: 16083300. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 13:04:13,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:04:15,848][00497] Updated weights for policy 0, policy_version 8177 (0.0029) [2024-03-29 13:04:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 134053888. Throughput: 0: 42138.3. Samples: 16213860. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:04:18,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:04:19,177][00476] Signal inference workers to stop experience collection... (550 times) [2024-03-29 13:04:19,253][00497] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-03-29 13:04:19,263][00476] Signal inference workers to resume experience collection... (550 times) [2024-03-29 13:04:19,285][00497] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-03-29 13:04:20,984][00497] Updated weights for policy 0, policy_version 8187 (0.0023) [2024-03-29 13:04:23,754][00497] Updated weights for policy 0, policy_version 8197 (0.0021) [2024-03-29 13:04:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42265.1). Total num frames: 134299648. Throughput: 0: 42874.1. Samples: 16480620. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:04:23,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:04:27,561][00497] Updated weights for policy 0, policy_version 8207 (0.0025) [2024-03-29 13:04:28,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 134512640. Throughput: 0: 42443.2. Samples: 16713320. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 13:04:28,840][00126] Avg episode reward: [(0, '0.277')] [2024-03-29 13:04:31,448][00497] Updated weights for policy 0, policy_version 8217 (0.0023) [2024-03-29 13:04:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 134692864. Throughput: 0: 42362.7. Samples: 16847100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:04:33,842][00126] Avg episode reward: [(0, '0.230')] [2024-03-29 13:04:36,513][00497] Updated weights for policy 0, policy_version 8227 (0.0017) [2024-03-29 13:04:38,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 134922240. Throughput: 0: 42703.1. Samples: 17118780. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:04:38,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 13:04:39,030][00476] Saving new best policy, reward=0.455! [2024-03-29 13:04:39,604][00497] Updated weights for policy 0, policy_version 8237 (0.0020) [2024-03-29 13:04:43,374][00497] Updated weights for policy 0, policy_version 8247 (0.0029) [2024-03-29 13:04:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 135135232. Throughput: 0: 42155.4. Samples: 17343180. Policy #0 lag: (min: 1.0, avg: 22.5, max: 44.0) [2024-03-29 13:04:43,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:04:47,049][00497] Updated weights for policy 0, policy_version 8257 (0.0028) [2024-03-29 13:04:48,841][00126] Fps is (10 sec: 39314.0, 60 sec: 42324.0, 300 sec: 42098.3). Total num frames: 135315456. Throughput: 0: 42242.1. Samples: 17476540. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 13:04:48,842][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:04:51,638][00476] Signal inference workers to stop experience collection... (600 times) [2024-03-29 13:04:51,751][00497] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-03-29 13:04:51,837][00476] Signal inference workers to resume experience collection... (600 times) [2024-03-29 13:04:51,837][00497] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-03-29 13:04:52,141][00497] Updated weights for policy 0, policy_version 8267 (0.0022) [2024-03-29 13:04:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 135544832. Throughput: 0: 42478.6. Samples: 17754500. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 13:04:53,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:04:54,884][00497] Updated weights for policy 0, policy_version 8277 (0.0019) [2024-03-29 13:04:58,550][00497] Updated weights for policy 0, policy_version 8287 (0.0022) [2024-03-29 13:04:58,839][00126] Fps is (10 sec: 45884.5, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 135774208. Throughput: 0: 42228.4. Samples: 17983580. Policy #0 lag: (min: 2.0, avg: 21.2, max: 43.0) [2024-03-29 13:04:58,840][00126] Avg episode reward: [(0, '0.260')] [2024-03-29 13:05:02,360][00497] Updated weights for policy 0, policy_version 8297 (0.0018) [2024-03-29 13:05:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 135970816. Throughput: 0: 42303.8. Samples: 18117540. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:05:03,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:05:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008299_135970816.pth... [2024-03-29 13:05:04,192][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007682_125861888.pth [2024-03-29 13:05:07,383][00497] Updated weights for policy 0, policy_version 8307 (0.0021) [2024-03-29 13:05:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 136183808. Throughput: 0: 42717.9. Samples: 18402920. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:05:08,840][00126] Avg episode reward: [(0, '0.265')] [2024-03-29 13:05:10,101][00497] Updated weights for policy 0, policy_version 8317 (0.0023) [2024-03-29 13:05:13,832][00497] Updated weights for policy 0, policy_version 8327 (0.0019) [2024-03-29 13:05:13,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 136429568. Throughput: 0: 42502.6. Samples: 18625940. Policy #0 lag: (min: 2.0, avg: 19.8, max: 41.0) [2024-03-29 13:05:13,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:05:17,710][00497] Updated weights for policy 0, policy_version 8337 (0.0020) [2024-03-29 13:05:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 136609792. Throughput: 0: 42370.0. Samples: 18753740. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:05:18,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:05:22,907][00497] Updated weights for policy 0, policy_version 8347 (0.0022) [2024-03-29 13:05:23,179][00476] Signal inference workers to stop experience collection... (650 times) [2024-03-29 13:05:23,218][00497] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-03-29 13:05:23,409][00476] Signal inference workers to resume experience collection... (650 times) [2024-03-29 13:05:23,409][00497] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-03-29 13:05:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 136806400. Throughput: 0: 42568.0. Samples: 19034340. Policy #0 lag: (min: 2.0, avg: 18.1, max: 42.0) [2024-03-29 13:05:23,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 13:05:25,686][00497] Updated weights for policy 0, policy_version 8357 (0.0027) [2024-03-29 13:05:28,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 137035776. Throughput: 0: 42390.3. Samples: 19250740. Policy #0 lag: (min: 2.0, avg: 18.1, max: 42.0) [2024-03-29 13:05:28,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:05:29,847][00497] Updated weights for policy 0, policy_version 8368 (0.0027) [2024-03-29 13:05:33,690][00497] Updated weights for policy 0, policy_version 8378 (0.0023) [2024-03-29 13:05:33,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 137265152. Throughput: 0: 42376.1. Samples: 19383380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 13:05:33,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:05:38,792][00497] Updated weights for policy 0, policy_version 8388 (0.0026) [2024-03-29 13:05:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 137428992. Throughput: 0: 42312.8. Samples: 19658580. Policy #0 lag: (min: 0.0, avg: 16.7, max: 43.0) [2024-03-29 13:05:38,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 13:05:41,709][00497] Updated weights for policy 0, policy_version 8398 (0.0029) [2024-03-29 13:05:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 137674752. Throughput: 0: 42190.3. Samples: 19882140. Policy #0 lag: (min: 0.0, avg: 16.7, max: 43.0) [2024-03-29 13:05:43,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 13:05:45,643][00497] Updated weights for policy 0, policy_version 8408 (0.0019) [2024-03-29 13:05:48,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42872.9, 300 sec: 42154.1). Total num frames: 137887744. Throughput: 0: 42037.4. Samples: 20009220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 13:05:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:05:49,349][00497] Updated weights for policy 0, policy_version 8418 (0.0026) [2024-03-29 13:05:53,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 138035200. Throughput: 0: 41754.6. Samples: 20281880. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 13:05:53,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:05:54,503][00497] Updated weights for policy 0, policy_version 8428 (0.0031) [2024-03-29 13:05:55,034][00476] Signal inference workers to stop experience collection... (700 times) [2024-03-29 13:05:55,054][00497] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-03-29 13:05:55,244][00476] Signal inference workers to resume experience collection... (700 times) [2024-03-29 13:05:55,244][00497] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-03-29 13:05:57,337][00497] Updated weights for policy 0, policy_version 8438 (0.0029) [2024-03-29 13:05:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 138313728. Throughput: 0: 41983.6. Samples: 20515200. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 13:05:58,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:06:00,958][00497] Updated weights for policy 0, policy_version 8448 (0.0020) [2024-03-29 13:06:03,839][00126] Fps is (10 sec: 49151.4, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 138526720. Throughput: 0: 42011.4. Samples: 20644260. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 13:06:03,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:06:04,764][00497] Updated weights for policy 0, policy_version 8458 (0.0020) [2024-03-29 13:06:08,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 138690560. Throughput: 0: 42006.8. Samples: 20924640. Policy #0 lag: (min: 0.0, avg: 18.2, max: 40.0) [2024-03-29 13:06:08,841][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:06:09,654][00497] Updated weights for policy 0, policy_version 8468 (0.0031) [2024-03-29 13:06:12,581][00497] Updated weights for policy 0, policy_version 8478 (0.0029) [2024-03-29 13:06:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 138952704. Throughput: 0: 42353.8. Samples: 21156660. Policy #0 lag: (min: 0.0, avg: 18.2, max: 40.0) [2024-03-29 13:06:13,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:06:16,126][00497] Updated weights for policy 0, policy_version 8488 (0.0019) [2024-03-29 13:06:18,839][00126] Fps is (10 sec: 47513.0, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 139165696. Throughput: 0: 42232.8. Samples: 21283860. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 13:06:18,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:06:20,230][00497] Updated weights for policy 0, policy_version 8498 (0.0029) [2024-03-29 13:06:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 139345920. Throughput: 0: 42227.2. Samples: 21558800. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:06:23,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:06:25,190][00497] Updated weights for policy 0, policy_version 8508 (0.0024) [2024-03-29 13:06:27,113][00476] Signal inference workers to stop experience collection... (750 times) [2024-03-29 13:06:27,113][00476] Signal inference workers to resume experience collection... (750 times) [2024-03-29 13:06:27,155][00497] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-03-29 13:06:27,155][00497] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-03-29 13:06:28,082][00497] Updated weights for policy 0, policy_version 8518 (0.0025) [2024-03-29 13:06:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 139591680. Throughput: 0: 42586.6. Samples: 21798540. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:06:28,841][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:06:31,897][00497] Updated weights for policy 0, policy_version 8528 (0.0024) [2024-03-29 13:06:33,839][00126] Fps is (10 sec: 45874.4, 60 sec: 42325.2, 300 sec: 42320.7). Total num frames: 139804672. Throughput: 0: 42564.4. Samples: 21924620. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 13:06:33,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:06:35,633][00497] Updated weights for policy 0, policy_version 8538 (0.0023) [2024-03-29 13:06:38,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42598.5, 300 sec: 42376.2). Total num frames: 139984896. Throughput: 0: 42530.2. Samples: 22195740. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 13:06:38,841][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:06:40,671][00497] Updated weights for policy 0, policy_version 8548 (0.0027) [2024-03-29 13:06:43,514][00497] Updated weights for policy 0, policy_version 8558 (0.0027) [2024-03-29 13:06:43,839][00126] Fps is (10 sec: 40960.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 140214272. Throughput: 0: 42685.3. Samples: 22436040. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 13:06:43,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:06:47,848][00497] Updated weights for policy 0, policy_version 8568 (0.0021) [2024-03-29 13:06:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.5, 300 sec: 42265.2). Total num frames: 140427264. Throughput: 0: 42506.0. Samples: 22557020. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:06:48,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 13:06:51,370][00497] Updated weights for policy 0, policy_version 8578 (0.0023) [2024-03-29 13:06:53,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42871.4, 300 sec: 42320.7). Total num frames: 140607488. Throughput: 0: 41876.7. Samples: 22809100. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 13:06:53,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:06:56,318][00497] Updated weights for policy 0, policy_version 8588 (0.0022) [2024-03-29 13:06:58,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 140820480. Throughput: 0: 42350.6. Samples: 23062440. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 13:06:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:06:58,851][00476] Signal inference workers to stop experience collection... (800 times) [2024-03-29 13:06:58,906][00497] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-03-29 13:06:58,941][00476] Signal inference workers to resume experience collection... (800 times) [2024-03-29 13:06:58,944][00497] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-03-29 13:06:59,484][00497] Updated weights for policy 0, policy_version 8598 (0.0035) [2024-03-29 13:07:03,705][00497] Updated weights for policy 0, policy_version 8608 (0.0027) [2024-03-29 13:07:03,840][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 141033472. Throughput: 0: 42006.6. Samples: 23174160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 13:07:03,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:07:04,214][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008610_141066240.pth... [2024-03-29 13:07:04,537][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007989_130891776.pth [2024-03-29 13:07:07,529][00497] Updated weights for policy 0, policy_version 8618 (0.0024) [2024-03-29 13:07:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 141230080. Throughput: 0: 41305.4. Samples: 23417540. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:07:08,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:07:12,268][00497] Updated weights for policy 0, policy_version 8628 (0.0025) [2024-03-29 13:07:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 141443072. Throughput: 0: 41765.3. Samples: 23677980. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:07:13,842][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:07:15,422][00497] Updated weights for policy 0, policy_version 8638 (0.0028) [2024-03-29 13:07:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 141672448. Throughput: 0: 41623.7. Samples: 23797680. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 13:07:18,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:07:19,209][00497] Updated weights for policy 0, policy_version 8648 (0.0026) [2024-03-29 13:07:23,036][00497] Updated weights for policy 0, policy_version 8658 (0.0024) [2024-03-29 13:07:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 141852672. Throughput: 0: 41345.8. Samples: 24056300. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 13:07:23,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:07:27,630][00497] Updated weights for policy 0, policy_version 8668 (0.0018) [2024-03-29 13:07:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 142082048. Throughput: 0: 41903.5. Samples: 24321700. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 13:07:28,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:07:29,994][00476] Signal inference workers to stop experience collection... (850 times) [2024-03-29 13:07:30,014][00497] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-03-29 13:07:30,207][00476] Signal inference workers to resume experience collection... (850 times) [2024-03-29 13:07:30,208][00497] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-03-29 13:07:30,811][00497] Updated weights for policy 0, policy_version 8678 (0.0029) [2024-03-29 13:07:33,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 142311424. Throughput: 0: 41688.8. Samples: 24433020. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:07:33,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:07:34,894][00497] Updated weights for policy 0, policy_version 8688 (0.0022) [2024-03-29 13:07:38,582][00497] Updated weights for policy 0, policy_version 8698 (0.0022) [2024-03-29 13:07:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 142508032. Throughput: 0: 41617.8. Samples: 24681900. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:07:38,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:07:43,219][00497] Updated weights for policy 0, policy_version 8708 (0.0026) [2024-03-29 13:07:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 142704640. Throughput: 0: 42261.9. Samples: 24964220. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:07:43,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:07:46,435][00497] Updated weights for policy 0, policy_version 8718 (0.0021) [2024-03-29 13:07:48,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 142950400. Throughput: 0: 42008.6. Samples: 25064540. Policy #0 lag: (min: 0.0, avg: 20.6, max: 45.0) [2024-03-29 13:07:48,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:07:50,594][00497] Updated weights for policy 0, policy_version 8728 (0.0023) [2024-03-29 13:07:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 143147008. Throughput: 0: 42073.8. Samples: 25310860. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 13:07:53,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:07:54,118][00497] Updated weights for policy 0, policy_version 8738 (0.0025) [2024-03-29 13:07:58,722][00497] Updated weights for policy 0, policy_version 8748 (0.0028) [2024-03-29 13:07:58,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 143327232. Throughput: 0: 42600.6. Samples: 25595000. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 13:07:58,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:08:01,968][00497] Updated weights for policy 0, policy_version 8758 (0.0021) [2024-03-29 13:08:02,593][00476] Signal inference workers to stop experience collection... (900 times) [2024-03-29 13:08:02,671][00476] Signal inference workers to resume experience collection... (900 times) [2024-03-29 13:08:02,668][00497] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-03-29 13:08:02,705][00497] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-03-29 13:08:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.5, 300 sec: 42209.6). Total num frames: 143572992. Throughput: 0: 42335.5. Samples: 25702780. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 13:08:03,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 13:08:06,541][00497] Updated weights for policy 0, policy_version 8769 (0.0022) [2024-03-29 13:08:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 143769600. Throughput: 0: 42060.5. Samples: 25949020. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 13:08:08,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 13:08:10,263][00497] Updated weights for policy 0, policy_version 8779 (0.0028) [2024-03-29 13:08:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 143949824. Throughput: 0: 42362.8. Samples: 26228020. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 13:08:13,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:08:14,488][00497] Updated weights for policy 0, policy_version 8789 (0.0028) [2024-03-29 13:08:17,979][00497] Updated weights for policy 0, policy_version 8799 (0.0021) [2024-03-29 13:08:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 144195584. Throughput: 0: 42292.5. Samples: 26336180. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:08:18,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 13:08:22,035][00497] Updated weights for policy 0, policy_version 8809 (0.0022) [2024-03-29 13:08:23,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 144408576. Throughput: 0: 42345.5. Samples: 26587440. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 13:08:23,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:08:25,729][00497] Updated weights for policy 0, policy_version 8819 (0.0025) [2024-03-29 13:08:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 144588800. Throughput: 0: 42217.0. Samples: 26863980. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 13:08:28,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 13:08:29,948][00497] Updated weights for policy 0, policy_version 8829 (0.0031) [2024-03-29 13:08:33,533][00497] Updated weights for policy 0, policy_version 8839 (0.0020) [2024-03-29 13:08:33,544][00476] Signal inference workers to stop experience collection... (950 times) [2024-03-29 13:08:33,545][00476] Signal inference workers to resume experience collection... (950 times) [2024-03-29 13:08:33,583][00497] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-03-29 13:08:33,583][00497] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-03-29 13:08:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 144834560. Throughput: 0: 42348.9. Samples: 26970240. Policy #0 lag: (min: 2.0, avg: 20.9, max: 44.0) [2024-03-29 13:08:33,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:08:37,307][00497] Updated weights for policy 0, policy_version 8849 (0.0033) [2024-03-29 13:08:38,839][00126] Fps is (10 sec: 45874.3, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 145047552. Throughput: 0: 42477.7. Samples: 27222360. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:08:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:08:41,473][00497] Updated weights for policy 0, policy_version 8859 (0.0020) [2024-03-29 13:08:43,839][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 145195008. Throughput: 0: 41838.1. Samples: 27477720. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:08:43,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:08:45,617][00497] Updated weights for policy 0, policy_version 8869 (0.0032) [2024-03-29 13:08:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 145457152. Throughput: 0: 42049.7. Samples: 27595020. Policy #0 lag: (min: 2.0, avg: 20.0, max: 43.0) [2024-03-29 13:08:48,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:08:49,445][00497] Updated weights for policy 0, policy_version 8879 (0.0022) [2024-03-29 13:08:53,131][00497] Updated weights for policy 0, policy_version 8889 (0.0026) [2024-03-29 13:08:53,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 145653760. Throughput: 0: 42024.8. Samples: 27840140. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:08:53,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:08:56,982][00497] Updated weights for policy 0, policy_version 8899 (0.0018) [2024-03-29 13:08:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 145850368. Throughput: 0: 41672.4. Samples: 28103280. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:08:58,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:09:01,382][00497] Updated weights for policy 0, policy_version 8909 (0.0028) [2024-03-29 13:09:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 146079744. Throughput: 0: 42047.0. Samples: 28228300. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:09:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:09:03,987][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008917_146096128.pth... [2024-03-29 13:09:04,305][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008299_135970816.pth [2024-03-29 13:09:04,982][00497] Updated weights for policy 0, policy_version 8919 (0.0019) [2024-03-29 13:09:08,456][00476] Signal inference workers to stop experience collection... (1000 times) [2024-03-29 13:09:08,532][00497] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-03-29 13:09:08,543][00476] Signal inference workers to resume experience collection... (1000 times) [2024-03-29 13:09:08,560][00497] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-03-29 13:09:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 146276352. Throughput: 0: 41658.1. Samples: 28462060. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:09:08,840][00126] Avg episode reward: [(0, '0.298')] [2024-03-29 13:09:08,852][00497] Updated weights for policy 0, policy_version 8929 (0.0027) [2024-03-29 13:09:12,723][00497] Updated weights for policy 0, policy_version 8939 (0.0022) [2024-03-29 13:09:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 146489344. Throughput: 0: 41542.5. Samples: 28733400. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:09:13,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:09:16,824][00497] Updated weights for policy 0, policy_version 8949 (0.0019) [2024-03-29 13:09:18,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 146702336. Throughput: 0: 42164.9. Samples: 28867660. Policy #0 lag: (min: 2.0, avg: 18.9, max: 42.0) [2024-03-29 13:09:18,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:09:20,274][00497] Updated weights for policy 0, policy_version 8959 (0.0018) [2024-03-29 13:09:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 146931712. Throughput: 0: 41804.0. Samples: 29103540. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:09:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:09:24,103][00497] Updated weights for policy 0, policy_version 8969 (0.0029) [2024-03-29 13:09:28,227][00497] Updated weights for policy 0, policy_version 8979 (0.0018) [2024-03-29 13:09:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 147128320. Throughput: 0: 42104.5. Samples: 29372420. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:09:28,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 13:09:32,133][00497] Updated weights for policy 0, policy_version 8989 (0.0025) [2024-03-29 13:09:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 42098.6). Total num frames: 147341312. Throughput: 0: 42701.8. Samples: 29516600. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 13:09:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:09:35,614][00497] Updated weights for policy 0, policy_version 8999 (0.0021) [2024-03-29 13:09:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 147570688. Throughput: 0: 42322.7. Samples: 29744660. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 13:09:38,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:09:39,273][00497] Updated weights for policy 0, policy_version 9009 (0.0019) [2024-03-29 13:09:40,344][00476] Signal inference workers to stop experience collection... (1050 times) [2024-03-29 13:09:40,351][00476] Signal inference workers to resume experience collection... (1050 times) [2024-03-29 13:09:40,369][00497] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-03-29 13:09:40,391][00497] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-03-29 13:09:43,584][00497] Updated weights for policy 0, policy_version 9019 (0.0023) [2024-03-29 13:09:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42871.5, 300 sec: 42209.9). Total num frames: 147767296. Throughput: 0: 42308.0. Samples: 30007140. Policy #0 lag: (min: 1.0, avg: 21.6, max: 40.0) [2024-03-29 13:09:43,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:09:47,579][00497] Updated weights for policy 0, policy_version 9029 (0.0029) [2024-03-29 13:09:48,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 147996672. Throughput: 0: 42825.7. Samples: 30155460. Policy #0 lag: (min: 1.0, avg: 17.8, max: 42.0) [2024-03-29 13:09:48,841][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:09:51,208][00497] Updated weights for policy 0, policy_version 9039 (0.0025) [2024-03-29 13:09:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 148193280. Throughput: 0: 42528.5. Samples: 30375840. Policy #0 lag: (min: 1.0, avg: 17.8, max: 42.0) [2024-03-29 13:09:53,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:09:54,858][00497] Updated weights for policy 0, policy_version 9049 (0.0029) [2024-03-29 13:09:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 148406272. Throughput: 0: 42219.1. Samples: 30633260. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 13:09:58,841][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:09:59,084][00497] Updated weights for policy 0, policy_version 9059 (0.0026) [2024-03-29 13:10:03,020][00497] Updated weights for policy 0, policy_version 9069 (0.0020) [2024-03-29 13:10:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 148635648. Throughput: 0: 42554.7. Samples: 30782620. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:03,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:10:06,463][00497] Updated weights for policy 0, policy_version 9079 (0.0019) [2024-03-29 13:10:08,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.5, 300 sec: 42098.5). Total num frames: 148848640. Throughput: 0: 42680.5. Samples: 31024160. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:08,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:10:10,603][00497] Updated weights for policy 0, policy_version 9089 (0.0023) [2024-03-29 13:10:12,465][00476] Signal inference workers to stop experience collection... (1100 times) [2024-03-29 13:10:12,506][00497] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-03-29 13:10:12,687][00476] Signal inference workers to resume experience collection... (1100 times) [2024-03-29 13:10:12,688][00497] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-03-29 13:10:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 149061632. Throughput: 0: 42195.1. Samples: 31271200. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:10:13,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:10:14,751][00497] Updated weights for policy 0, policy_version 9099 (0.0030) [2024-03-29 13:10:18,420][00497] Updated weights for policy 0, policy_version 9109 (0.0024) [2024-03-29 13:10:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 149258240. Throughput: 0: 42190.2. Samples: 31415160. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:10:18,841][00126] Avg episode reward: [(0, '0.220')] [2024-03-29 13:10:22,094][00497] Updated weights for policy 0, policy_version 9119 (0.0020) [2024-03-29 13:10:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 149504000. Throughput: 0: 42459.0. Samples: 31655320. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:10:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:10:26,326][00497] Updated weights for policy 0, policy_version 9129 (0.0025) [2024-03-29 13:10:28,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 149684224. Throughput: 0: 42170.2. Samples: 31904800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 13:10:28,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:10:30,580][00497] Updated weights for policy 0, policy_version 9139 (0.0020) [2024-03-29 13:10:33,839][00126] Fps is (10 sec: 36044.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 149864448. Throughput: 0: 41783.1. Samples: 32035700. Policy #0 lag: (min: 0.0, avg: 17.5, max: 41.0) [2024-03-29 13:10:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:10:34,167][00497] Updated weights for policy 0, policy_version 9149 (0.0024) [2024-03-29 13:10:37,615][00497] Updated weights for policy 0, policy_version 9159 (0.0024) [2024-03-29 13:10:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 150110208. Throughput: 0: 42391.6. Samples: 32283460. Policy #0 lag: (min: 0.0, avg: 17.5, max: 41.0) [2024-03-29 13:10:38,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:10:42,057][00497] Updated weights for policy 0, policy_version 9169 (0.0023) [2024-03-29 13:10:42,741][00476] Signal inference workers to stop experience collection... (1150 times) [2024-03-29 13:10:42,775][00497] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-03-29 13:10:42,929][00476] Signal inference workers to resume experience collection... (1150 times) [2024-03-29 13:10:42,930][00497] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-03-29 13:10:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 150306816. Throughput: 0: 42151.0. Samples: 32530060. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:10:43,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:10:46,005][00497] Updated weights for policy 0, policy_version 9179 (0.0034) [2024-03-29 13:10:48,839][00126] Fps is (10 sec: 36044.6, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 150470656. Throughput: 0: 41692.3. Samples: 32658780. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:48,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:10:50,066][00497] Updated weights for policy 0, policy_version 9189 (0.0019) [2024-03-29 13:10:53,338][00497] Updated weights for policy 0, policy_version 9199 (0.0022) [2024-03-29 13:10:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 150749184. Throughput: 0: 41898.7. Samples: 32909600. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:53,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 13:10:57,681][00497] Updated weights for policy 0, policy_version 9209 (0.0019) [2024-03-29 13:10:58,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 150913024. Throughput: 0: 41831.0. Samples: 33153600. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 13:10:58,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 13:11:01,383][00497] Updated weights for policy 0, policy_version 9219 (0.0021) [2024-03-29 13:11:03,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 42098.5). Total num frames: 151109632. Throughput: 0: 41387.7. Samples: 33277600. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 13:11:03,840][00126] Avg episode reward: [(0, '0.316')] [2024-03-29 13:11:04,158][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009224_151126016.pth... [2024-03-29 13:11:04,469][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008610_141066240.pth [2024-03-29 13:11:05,757][00497] Updated weights for policy 0, policy_version 9229 (0.0024) [2024-03-29 13:11:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 151339008. Throughput: 0: 41828.5. Samples: 33537600. Policy #0 lag: (min: 0.0, avg: 17.8, max: 42.0) [2024-03-29 13:11:08,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:11:09,197][00497] Updated weights for policy 0, policy_version 9239 (0.0033) [2024-03-29 13:11:13,321][00476] Signal inference workers to stop experience collection... (1200 times) [2024-03-29 13:11:13,353][00497] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-03-29 13:11:13,501][00476] Signal inference workers to resume experience collection... (1200 times) [2024-03-29 13:11:13,502][00497] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-03-29 13:11:13,505][00497] Updated weights for policy 0, policy_version 9249 (0.0028) [2024-03-29 13:11:13,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 151552000. Throughput: 0: 41534.6. Samples: 33773860. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:11:13,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:11:17,351][00497] Updated weights for policy 0, policy_version 9259 (0.0018) [2024-03-29 13:11:18,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.2, 300 sec: 41987.5). Total num frames: 151732224. Throughput: 0: 41225.0. Samples: 33890820. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:11:18,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:11:21,621][00497] Updated weights for policy 0, policy_version 9269 (0.0022) [2024-03-29 13:11:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 151977984. Throughput: 0: 41750.2. Samples: 34162220. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 13:11:23,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 13:11:24,783][00497] Updated weights for policy 0, policy_version 9279 (0.0019) [2024-03-29 13:11:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41932.0). Total num frames: 152174592. Throughput: 0: 41551.7. Samples: 34399880. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:11:28,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 13:11:29,145][00497] Updated weights for policy 0, policy_version 9289 (0.0027) [2024-03-29 13:11:32,957][00497] Updated weights for policy 0, policy_version 9299 (0.0019) [2024-03-29 13:11:33,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 152354816. Throughput: 0: 41385.0. Samples: 34521100. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:11:33,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:11:37,265][00497] Updated weights for policy 0, policy_version 9309 (0.0025) [2024-03-29 13:11:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 152600576. Throughput: 0: 41954.6. Samples: 34797560. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 13:11:38,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:11:40,378][00497] Updated weights for policy 0, policy_version 9319 (0.0026) [2024-03-29 13:11:43,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 152829952. Throughput: 0: 41730.3. Samples: 35031460. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:11:43,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:11:44,776][00497] Updated weights for policy 0, policy_version 9329 (0.0023) [2024-03-29 13:11:44,777][00476] Signal inference workers to stop experience collection... (1250 times) [2024-03-29 13:11:44,778][00476] Signal inference workers to resume experience collection... (1250 times) [2024-03-29 13:11:44,820][00497] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-03-29 13:11:44,821][00497] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-03-29 13:11:48,565][00497] Updated weights for policy 0, policy_version 9339 (0.0029) [2024-03-29 13:11:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 153010176. Throughput: 0: 41859.1. Samples: 35161260. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:11:48,840][00126] Avg episode reward: [(0, '0.255')] [2024-03-29 13:11:52,919][00497] Updated weights for policy 0, policy_version 9349 (0.0028) [2024-03-29 13:11:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40960.0, 300 sec: 41987.5). Total num frames: 153206784. Throughput: 0: 42081.8. Samples: 35431280. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:11:53,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:11:56,036][00497] Updated weights for policy 0, policy_version 9359 (0.0028) [2024-03-29 13:11:58,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 153452544. Throughput: 0: 41798.3. Samples: 35654780. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:11:58,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:12:00,409][00497] Updated weights for policy 0, policy_version 9369 (0.0019) [2024-03-29 13:12:03,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 153649152. Throughput: 0: 42356.3. Samples: 35796860. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:12:03,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:12:03,959][00497] Updated weights for policy 0, policy_version 9379 (0.0017) [2024-03-29 13:12:08,362][00497] Updated weights for policy 0, policy_version 9389 (0.0024) [2024-03-29 13:12:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 153845760. Throughput: 0: 42140.0. Samples: 36058520. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 13:12:08,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:12:11,603][00497] Updated weights for policy 0, policy_version 9399 (0.0027) [2024-03-29 13:12:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 154091520. Throughput: 0: 41839.5. Samples: 36282660. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 13:12:13,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:12:16,143][00497] Updated weights for policy 0, policy_version 9409 (0.0019) [2024-03-29 13:12:16,730][00476] Signal inference workers to stop experience collection... (1300 times) [2024-03-29 13:12:16,749][00497] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-03-29 13:12:16,939][00476] Signal inference workers to resume experience collection... (1300 times) [2024-03-29 13:12:16,940][00497] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-03-29 13:12:18,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 154288128. Throughput: 0: 42477.7. Samples: 36432600. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:12:18,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:12:19,662][00497] Updated weights for policy 0, policy_version 9419 (0.0023) [2024-03-29 13:12:23,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 154468352. Throughput: 0: 41990.7. Samples: 36687140. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:12:23,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 13:12:24,048][00497] Updated weights for policy 0, policy_version 9429 (0.0018) [2024-03-29 13:12:27,022][00497] Updated weights for policy 0, policy_version 9439 (0.0023) [2024-03-29 13:12:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 154730496. Throughput: 0: 41896.0. Samples: 36916780. Policy #0 lag: (min: 1.0, avg: 19.4, max: 42.0) [2024-03-29 13:12:28,840][00126] Avg episode reward: [(0, '0.270')] [2024-03-29 13:12:31,544][00497] Updated weights for policy 0, policy_version 9449 (0.0017) [2024-03-29 13:12:33,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42871.3, 300 sec: 42098.5). Total num frames: 154927104. Throughput: 0: 42132.4. Samples: 37057220. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:12:33,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 13:12:35,491][00497] Updated weights for policy 0, policy_version 9459 (0.0024) [2024-03-29 13:12:38,839][00126] Fps is (10 sec: 37682.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 155107328. Throughput: 0: 42123.8. Samples: 37326860. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:12:38,842][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:12:39,643][00497] Updated weights for policy 0, policy_version 9469 (0.0023) [2024-03-29 13:12:42,557][00497] Updated weights for policy 0, policy_version 9479 (0.0022) [2024-03-29 13:12:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 155353088. Throughput: 0: 42089.3. Samples: 37548800. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 13:12:43,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 13:12:47,020][00497] Updated weights for policy 0, policy_version 9489 (0.0022) [2024-03-29 13:12:47,588][00476] Signal inference workers to stop experience collection... (1350 times) [2024-03-29 13:12:47,697][00497] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-03-29 13:12:47,835][00476] Signal inference workers to resume experience collection... (1350 times) [2024-03-29 13:12:47,835][00497] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-03-29 13:12:48,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 155549696. Throughput: 0: 41907.3. Samples: 37682680. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 13:12:48,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:12:51,223][00497] Updated weights for policy 0, policy_version 9499 (0.0023) [2024-03-29 13:12:53,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 155729920. Throughput: 0: 41924.4. Samples: 37945120. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 13:12:53,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:12:55,433][00497] Updated weights for policy 0, policy_version 9509 (0.0022) [2024-03-29 13:12:58,406][00497] Updated weights for policy 0, policy_version 9519 (0.0031) [2024-03-29 13:12:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 155975680. Throughput: 0: 42248.9. Samples: 38183860. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 13:12:58,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:13:02,774][00497] Updated weights for policy 0, policy_version 9529 (0.0022) [2024-03-29 13:13:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 156172288. Throughput: 0: 41731.5. Samples: 38310520. Policy #0 lag: (min: 1.0, avg: 20.5, max: 40.0) [2024-03-29 13:13:03,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:13:04,307][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009534_156205056.pth... [2024-03-29 13:13:04,641][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008917_146096128.pth [2024-03-29 13:13:06,826][00497] Updated weights for policy 0, policy_version 9539 (0.0027) [2024-03-29 13:13:08,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 156352512. Throughput: 0: 41741.8. Samples: 38565520. Policy #0 lag: (min: 1.0, avg: 20.5, max: 40.0) [2024-03-29 13:13:08,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:13:11,044][00497] Updated weights for policy 0, policy_version 9549 (0.0035) [2024-03-29 13:13:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 156598272. Throughput: 0: 42219.1. Samples: 38816640. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:13,841][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:13:14,076][00497] Updated weights for policy 0, policy_version 9559 (0.0029) [2024-03-29 13:13:18,577][00497] Updated weights for policy 0, policy_version 9569 (0.0024) [2024-03-29 13:13:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 156778496. Throughput: 0: 41599.1. Samples: 38929180. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:18,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:13:22,418][00497] Updated weights for policy 0, policy_version 9579 (0.0022) [2024-03-29 13:13:23,325][00476] Signal inference workers to stop experience collection... (1400 times) [2024-03-29 13:13:23,363][00497] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-03-29 13:13:23,518][00476] Signal inference workers to resume experience collection... (1400 times) [2024-03-29 13:13:23,518][00497] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-03-29 13:13:23,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 156975104. Throughput: 0: 41420.6. Samples: 39190780. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 13:13:23,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:13:26,624][00497] Updated weights for policy 0, policy_version 9589 (0.0026) [2024-03-29 13:13:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 157220864. Throughput: 0: 42320.0. Samples: 39453200. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:28,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:13:29,811][00497] Updated weights for policy 0, policy_version 9599 (0.0030) [2024-03-29 13:13:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 157417472. Throughput: 0: 41643.5. Samples: 39556640. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 13:13:33,858][00476] Saving new best policy, reward=0.486! [2024-03-29 13:13:34,458][00497] Updated weights for policy 0, policy_version 9609 (0.0024) [2024-03-29 13:13:38,334][00497] Updated weights for policy 0, policy_version 9619 (0.0034) [2024-03-29 13:13:38,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 157597696. Throughput: 0: 41441.0. Samples: 39809960. Policy #0 lag: (min: 2.0, avg: 20.4, max: 41.0) [2024-03-29 13:13:38,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:13:42,478][00497] Updated weights for policy 0, policy_version 9629 (0.0031) [2024-03-29 13:13:43,839][00126] Fps is (10 sec: 39322.0, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 157810688. Throughput: 0: 42030.7. Samples: 40075240. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 13:13:43,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:13:45,726][00497] Updated weights for policy 0, policy_version 9639 (0.0027) [2024-03-29 13:13:48,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 158056448. Throughput: 0: 41466.7. Samples: 40176520. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 13:13:48,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:13:50,040][00497] Updated weights for policy 0, policy_version 9649 (0.0028) [2024-03-29 13:13:53,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 158236672. Throughput: 0: 41228.4. Samples: 40420800. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:13:53,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 13:13:54,297][00497] Updated weights for policy 0, policy_version 9659 (0.0022) [2024-03-29 13:13:55,168][00476] Signal inference workers to stop experience collection... (1450 times) [2024-03-29 13:13:55,211][00497] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-03-29 13:13:55,247][00476] Signal inference workers to resume experience collection... (1450 times) [2024-03-29 13:13:55,254][00497] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-03-29 13:13:58,355][00497] Updated weights for policy 0, policy_version 9669 (0.0030) [2024-03-29 13:13:58,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40686.9, 300 sec: 41820.9). Total num frames: 158416896. Throughput: 0: 41772.5. Samples: 40696400. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:13:58,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 13:14:01,311][00497] Updated weights for policy 0, policy_version 9679 (0.0024) [2024-03-29 13:14:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 158679040. Throughput: 0: 41756.4. Samples: 40808220. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 13:14:03,840][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 13:14:05,450][00497] Updated weights for policy 0, policy_version 9689 (0.0023) [2024-03-29 13:14:08,839][00126] Fps is (10 sec: 47513.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 158892032. Throughput: 0: 41691.0. Samples: 41066880. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:14:08,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:14:09,672][00497] Updated weights for policy 0, policy_version 9699 (0.0024) [2024-03-29 13:14:13,660][00497] Updated weights for policy 0, policy_version 9709 (0.0034) [2024-03-29 13:14:13,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 159072256. Throughput: 0: 41689.8. Samples: 41329240. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:14:13,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 13:14:16,836][00497] Updated weights for policy 0, policy_version 9719 (0.0022) [2024-03-29 13:14:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 159301632. Throughput: 0: 42028.0. Samples: 41447900. Policy #0 lag: (min: 1.0, avg: 19.8, max: 44.0) [2024-03-29 13:14:18,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:14:21,106][00497] Updated weights for policy 0, policy_version 9729 (0.0025) [2024-03-29 13:14:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 159514624. Throughput: 0: 42024.0. Samples: 41701040. Policy #0 lag: (min: 1.0, avg: 22.9, max: 43.0) [2024-03-29 13:14:23,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:14:25,219][00497] Updated weights for policy 0, policy_version 9739 (0.0031) [2024-03-29 13:14:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 159711232. Throughput: 0: 41850.1. Samples: 41958500. Policy #0 lag: (min: 1.0, avg: 22.9, max: 43.0) [2024-03-29 13:14:28,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:14:28,996][00497] Updated weights for policy 0, policy_version 9749 (0.0024) [2024-03-29 13:14:30,058][00476] Signal inference workers to stop experience collection... (1500 times) [2024-03-29 13:14:30,092][00497] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-03-29 13:14:30,265][00476] Signal inference workers to resume experience collection... (1500 times) [2024-03-29 13:14:30,266][00497] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-03-29 13:14:32,213][00497] Updated weights for policy 0, policy_version 9759 (0.0024) [2024-03-29 13:14:33,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.2, 300 sec: 41987.4). Total num frames: 159956992. Throughput: 0: 42364.3. Samples: 42082920. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:14:33,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:14:36,670][00497] Updated weights for policy 0, policy_version 9769 (0.0017) [2024-03-29 13:14:38,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 160169984. Throughput: 0: 42528.9. Samples: 42334600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:14:38,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:14:41,020][00497] Updated weights for policy 0, policy_version 9779 (0.0020) [2024-03-29 13:14:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 160350208. Throughput: 0: 41927.0. Samples: 42583120. Policy #0 lag: (min: 0.0, avg: 22.2, max: 44.0) [2024-03-29 13:14:43,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:14:44,689][00497] Updated weights for policy 0, policy_version 9789 (0.0021) [2024-03-29 13:14:47,976][00497] Updated weights for policy 0, policy_version 9799 (0.0026) [2024-03-29 13:14:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 160579584. Throughput: 0: 42462.4. Samples: 42719020. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 13:14:48,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:14:52,129][00497] Updated weights for policy 0, policy_version 9809 (0.0019) [2024-03-29 13:14:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 160792576. Throughput: 0: 42185.4. Samples: 42965220. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 13:14:53,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:14:56,707][00497] Updated weights for policy 0, policy_version 9819 (0.0019) [2024-03-29 13:14:58,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 160972800. Throughput: 0: 42019.9. Samples: 43220140. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:14:58,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:15:00,288][00497] Updated weights for policy 0, policy_version 9829 (0.0022) [2024-03-29 13:15:02,365][00476] Signal inference workers to stop experience collection... (1550 times) [2024-03-29 13:15:02,408][00497] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-03-29 13:15:02,579][00476] Signal inference workers to resume experience collection... (1550 times) [2024-03-29 13:15:02,580][00497] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-03-29 13:15:03,717][00497] Updated weights for policy 0, policy_version 9839 (0.0022) [2024-03-29 13:15:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 161202176. Throughput: 0: 42381.8. Samples: 43355080. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 13:15:03,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 13:15:04,012][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009840_161218560.pth... [2024-03-29 13:15:04,340][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009224_151126016.pth [2024-03-29 13:15:08,058][00497] Updated weights for policy 0, policy_version 9849 (0.0023) [2024-03-29 13:15:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 161398784. Throughput: 0: 41712.4. Samples: 43578100. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 13:15:08,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 13:15:12,370][00497] Updated weights for policy 0, policy_version 9859 (0.0023) [2024-03-29 13:15:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 161579008. Throughput: 0: 41828.1. Samples: 43840760. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:15:13,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:15:16,197][00497] Updated weights for policy 0, policy_version 9869 (0.0022) [2024-03-29 13:15:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 161808384. Throughput: 0: 41803.2. Samples: 43964060. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:15:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:15:19,533][00497] Updated weights for policy 0, policy_version 9879 (0.0034) [2024-03-29 13:15:23,588][00497] Updated weights for policy 0, policy_version 9889 (0.0022) [2024-03-29 13:15:23,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 162021376. Throughput: 0: 41667.7. Samples: 44209640. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 13:15:23,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:15:27,837][00497] Updated weights for policy 0, policy_version 9899 (0.0023) [2024-03-29 13:15:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 162217984. Throughput: 0: 41834.0. Samples: 44465640. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:15:28,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:15:31,800][00497] Updated weights for policy 0, policy_version 9909 (0.0020) [2024-03-29 13:15:33,652][00476] Signal inference workers to stop experience collection... (1600 times) [2024-03-29 13:15:33,725][00497] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-03-29 13:15:33,727][00476] Signal inference workers to resume experience collection... (1600 times) [2024-03-29 13:15:33,754][00497] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-03-29 13:15:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 162447360. Throughput: 0: 41625.7. Samples: 44592180. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:15:33,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:15:34,966][00497] Updated weights for policy 0, policy_version 9919 (0.0024) [2024-03-29 13:15:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 162660352. Throughput: 0: 41793.8. Samples: 44845940. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:15:38,841][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 13:15:38,972][00497] Updated weights for policy 0, policy_version 9929 (0.0024) [2024-03-29 13:15:43,281][00497] Updated weights for policy 0, policy_version 9939 (0.0019) [2024-03-29 13:15:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 162856960. Throughput: 0: 41920.0. Samples: 45106540. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:15:43,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:15:47,196][00497] Updated weights for policy 0, policy_version 9949 (0.0029) [2024-03-29 13:15:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 163069952. Throughput: 0: 41835.6. Samples: 45237680. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 13:15:48,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:15:50,431][00497] Updated weights for policy 0, policy_version 9959 (0.0019) [2024-03-29 13:15:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 163299328. Throughput: 0: 42204.8. Samples: 45477320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 13:15:53,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:15:54,608][00497] Updated weights for policy 0, policy_version 9969 (0.0018) [2024-03-29 13:15:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 163479552. Throughput: 0: 41950.7. Samples: 45728540. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 13:15:58,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:15:59,135][00497] Updated weights for policy 0, policy_version 9979 (0.0019) [2024-03-29 13:16:02,782][00497] Updated weights for policy 0, policy_version 9989 (0.0021) [2024-03-29 13:16:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 163692544. Throughput: 0: 42056.0. Samples: 45856580. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 13:16:03,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:16:04,262][00476] Signal inference workers to stop experience collection... (1650 times) [2024-03-29 13:16:04,263][00476] Signal inference workers to resume experience collection... (1650 times) [2024-03-29 13:16:04,303][00497] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-03-29 13:16:04,303][00497] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-03-29 13:16:06,094][00497] Updated weights for policy 0, policy_version 9999 (0.0036) [2024-03-29 13:16:08,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 163938304. Throughput: 0: 42159.2. Samples: 46106800. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 13:16:08,842][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:16:10,161][00497] Updated weights for policy 0, policy_version 10009 (0.0027) [2024-03-29 13:16:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 164134912. Throughput: 0: 42137.1. Samples: 46361820. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:16:13,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:16:14,352][00497] Updated weights for policy 0, policy_version 10019 (0.0021) [2024-03-29 13:16:18,179][00497] Updated weights for policy 0, policy_version 10029 (0.0019) [2024-03-29 13:16:18,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 164331520. Throughput: 0: 42456.8. Samples: 46502740. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 13:16:18,840][00126] Avg episode reward: [(0, '0.266')] [2024-03-29 13:16:21,615][00497] Updated weights for policy 0, policy_version 10039 (0.0028) [2024-03-29 13:16:23,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 164577280. Throughput: 0: 42247.0. Samples: 46747060. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 13:16:23,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:16:25,700][00497] Updated weights for policy 0, policy_version 10049 (0.0026) [2024-03-29 13:16:28,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 164773888. Throughput: 0: 41995.1. Samples: 46996320. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 13:16:28,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 13:16:29,779][00497] Updated weights for policy 0, policy_version 10059 (0.0024) [2024-03-29 13:16:33,442][00497] Updated weights for policy 0, policy_version 10069 (0.0028) [2024-03-29 13:16:33,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 164986880. Throughput: 0: 42162.7. Samples: 47135000. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 13:16:33,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 13:16:35,270][00476] Signal inference workers to stop experience collection... (1700 times) [2024-03-29 13:16:35,271][00476] Signal inference workers to resume experience collection... (1700 times) [2024-03-29 13:16:35,318][00497] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-03-29 13:16:35,318][00497] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-03-29 13:16:36,823][00497] Updated weights for policy 0, policy_version 10079 (0.0024) [2024-03-29 13:16:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 165216256. Throughput: 0: 42372.5. Samples: 47384080. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 13:16:38,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 13:16:41,051][00497] Updated weights for policy 0, policy_version 10089 (0.0024) [2024-03-29 13:16:43,842][00126] Fps is (10 sec: 44225.3, 60 sec: 42869.7, 300 sec: 42098.2). Total num frames: 165429248. Throughput: 0: 42304.7. Samples: 47632360. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 13:16:43,844][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 13:16:45,284][00497] Updated weights for policy 0, policy_version 10099 (0.0018) [2024-03-29 13:16:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 165609472. Throughput: 0: 42448.9. Samples: 47766780. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 13:16:48,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:16:48,954][00497] Updated weights for policy 0, policy_version 10109 (0.0020) [2024-03-29 13:16:52,152][00497] Updated weights for policy 0, policy_version 10119 (0.0022) [2024-03-29 13:16:53,839][00126] Fps is (10 sec: 42609.5, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 165855232. Throughput: 0: 42551.6. Samples: 48021620. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 13:16:53,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:16:56,430][00497] Updated weights for policy 0, policy_version 10129 (0.0022) [2024-03-29 13:16:58,839][00126] Fps is (10 sec: 47514.1, 60 sec: 43417.6, 300 sec: 42154.1). Total num frames: 166084608. Throughput: 0: 42675.3. Samples: 48282200. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 13:16:58,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:17:00,584][00497] Updated weights for policy 0, policy_version 10139 (0.0027) [2024-03-29 13:17:03,839][00126] Fps is (10 sec: 39320.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 166248448. Throughput: 0: 42429.8. Samples: 48412080. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 13:17:03,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:17:04,137][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010148_166264832.pth... [2024-03-29 13:17:04,436][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009534_156205056.pth [2024-03-29 13:17:04,708][00497] Updated weights for policy 0, policy_version 10149 (0.0018) [2024-03-29 13:17:06,653][00476] Signal inference workers to stop experience collection... (1750 times) [2024-03-29 13:17:06,689][00497] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-03-29 13:17:06,833][00476] Signal inference workers to resume experience collection... (1750 times) [2024-03-29 13:17:06,834][00497] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-03-29 13:17:07,769][00497] Updated weights for policy 0, policy_version 10159 (0.0025) [2024-03-29 13:17:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 166477824. Throughput: 0: 42480.1. Samples: 48658660. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:17:08,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:17:12,214][00497] Updated weights for policy 0, policy_version 10169 (0.0023) [2024-03-29 13:17:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 166690816. Throughput: 0: 42399.5. Samples: 48904300. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:17:13,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:17:16,200][00497] Updated weights for policy 0, policy_version 10179 (0.0020) [2024-03-29 13:17:18,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42325.5, 300 sec: 42043.0). Total num frames: 166871040. Throughput: 0: 42148.5. Samples: 49031680. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:17:18,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:17:20,347][00497] Updated weights for policy 0, policy_version 10189 (0.0022) [2024-03-29 13:17:23,762][00497] Updated weights for policy 0, policy_version 10199 (0.0022) [2024-03-29 13:17:23,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 167100416. Throughput: 0: 42303.6. Samples: 49287740. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:17:23,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:17:27,745][00497] Updated weights for policy 0, policy_version 10209 (0.0021) [2024-03-29 13:17:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 167313408. Throughput: 0: 42330.8. Samples: 49537140. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 13:17:28,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 13:17:31,609][00497] Updated weights for policy 0, policy_version 10219 (0.0024) [2024-03-29 13:17:33,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 167493632. Throughput: 0: 42168.4. Samples: 49664360. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 13:17:33,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:17:35,762][00497] Updated weights for policy 0, policy_version 10229 (0.0024) [2024-03-29 13:17:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 167739392. Throughput: 0: 42150.2. Samples: 49918380. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:17:38,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:17:39,072][00497] Updated weights for policy 0, policy_version 10239 (0.0021) [2024-03-29 13:17:43,316][00476] Signal inference workers to stop experience collection... (1800 times) [2024-03-29 13:17:43,376][00497] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-03-29 13:17:43,408][00476] Signal inference workers to resume experience collection... (1800 times) [2024-03-29 13:17:43,412][00497] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-03-29 13:17:43,415][00497] Updated weights for policy 0, policy_version 10249 (0.0022) [2024-03-29 13:17:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41780.9, 300 sec: 41987.5). Total num frames: 167936000. Throughput: 0: 41889.2. Samples: 50167220. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 13:17:43,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:17:47,351][00497] Updated weights for policy 0, policy_version 10259 (0.0029) [2024-03-29 13:17:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 168132608. Throughput: 0: 41534.4. Samples: 50281120. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 13:17:48,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:17:51,355][00497] Updated weights for policy 0, policy_version 10269 (0.0023) [2024-03-29 13:17:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 168361984. Throughput: 0: 41937.4. Samples: 50545840. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:17:53,840][00126] Avg episode reward: [(0, '0.285')] [2024-03-29 13:17:54,650][00497] Updated weights for policy 0, policy_version 10279 (0.0023) [2024-03-29 13:17:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 168542208. Throughput: 0: 41932.6. Samples: 50791260. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:17:58,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:17:59,108][00497] Updated weights for policy 0, policy_version 10289 (0.0021) [2024-03-29 13:18:02,710][00497] Updated weights for policy 0, policy_version 10299 (0.0028) [2024-03-29 13:18:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 168771584. Throughput: 0: 41824.8. Samples: 50913800. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 13:18:03,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 13:18:07,226][00497] Updated weights for policy 0, policy_version 10309 (0.0025) [2024-03-29 13:18:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 168968192. Throughput: 0: 41964.5. Samples: 51176140. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 13:18:08,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:18:10,425][00497] Updated weights for policy 0, policy_version 10319 (0.0024) [2024-03-29 13:18:11,863][00476] Signal inference workers to stop experience collection... (1850 times) [2024-03-29 13:18:11,883][00497] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-03-29 13:18:12,073][00476] Signal inference workers to resume experience collection... (1850 times) [2024-03-29 13:18:12,074][00497] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-03-29 13:18:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 169181184. Throughput: 0: 41539.9. Samples: 51406440. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 13:18:13,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:18:14,961][00497] Updated weights for policy 0, policy_version 10329 (0.0021) [2024-03-29 13:18:18,374][00497] Updated weights for policy 0, policy_version 10339 (0.0021) [2024-03-29 13:18:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 169410560. Throughput: 0: 41543.7. Samples: 51533820. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:18,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:18:23,002][00497] Updated weights for policy 0, policy_version 10349 (0.0023) [2024-03-29 13:18:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 169590784. Throughput: 0: 41626.2. Samples: 51791560. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:23,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:18:26,424][00497] Updated weights for policy 0, policy_version 10359 (0.0029) [2024-03-29 13:18:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 169820160. Throughput: 0: 41526.4. Samples: 52035900. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 13:18:28,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:18:30,591][00497] Updated weights for policy 0, policy_version 10369 (0.0019) [2024-03-29 13:18:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 170033152. Throughput: 0: 41997.8. Samples: 52171020. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:33,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:18:33,999][00497] Updated weights for policy 0, policy_version 10379 (0.0024) [2024-03-29 13:18:38,295][00497] Updated weights for policy 0, policy_version 10389 (0.0019) [2024-03-29 13:18:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 42098.5). Total num frames: 170229760. Throughput: 0: 42046.7. Samples: 52437940. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:38,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:18:41,820][00497] Updated weights for policy 0, policy_version 10399 (0.0018) [2024-03-29 13:18:43,840][00126] Fps is (10 sec: 44232.3, 60 sec: 42324.7, 300 sec: 42098.4). Total num frames: 170475520. Throughput: 0: 41941.7. Samples: 52678680. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:18:43,841][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:18:45,721][00476] Signal inference workers to stop experience collection... (1900 times) [2024-03-29 13:18:45,743][00497] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-03-29 13:18:45,943][00476] Signal inference workers to resume experience collection... (1900 times) [2024-03-29 13:18:45,943][00497] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-03-29 13:18:45,947][00497] Updated weights for policy 0, policy_version 10409 (0.0023) [2024-03-29 13:18:48,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 170672128. Throughput: 0: 42123.4. Samples: 52809360. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:18:48,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:18:49,275][00497] Updated weights for policy 0, policy_version 10419 (0.0021) [2024-03-29 13:18:53,839][00126] Fps is (10 sec: 37687.2, 60 sec: 41506.2, 300 sec: 42154.1). Total num frames: 170852352. Throughput: 0: 42275.5. Samples: 53078540. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:18:53,841][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:18:53,907][00497] Updated weights for policy 0, policy_version 10429 (0.0037) [2024-03-29 13:18:57,150][00497] Updated weights for policy 0, policy_version 10439 (0.0029) [2024-03-29 13:18:58,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 171098112. Throughput: 0: 42319.6. Samples: 53310820. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:18:58,840][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 13:19:01,459][00497] Updated weights for policy 0, policy_version 10449 (0.0026) [2024-03-29 13:19:03,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 171311104. Throughput: 0: 42607.9. Samples: 53451180. Policy #0 lag: (min: 2.0, avg: 22.3, max: 42.0) [2024-03-29 13:19:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 13:19:04,005][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010457_171327488.pth... [2024-03-29 13:19:04,335][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009840_161218560.pth [2024-03-29 13:19:04,839][00497] Updated weights for policy 0, policy_version 10459 (0.0020) [2024-03-29 13:19:08,840][00126] Fps is (10 sec: 39320.7, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 171491328. Throughput: 0: 42546.1. Samples: 53706140. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:19:08,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:19:09,483][00497] Updated weights for policy 0, policy_version 10469 (0.0024) [2024-03-29 13:19:12,990][00497] Updated weights for policy 0, policy_version 10479 (0.0024) [2024-03-29 13:19:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 171720704. Throughput: 0: 42271.1. Samples: 53938100. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:19:13,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:19:17,053][00476] Signal inference workers to stop experience collection... (1950 times) [2024-03-29 13:19:17,126][00497] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-03-29 13:19:17,144][00476] Signal inference workers to resume experience collection... (1950 times) [2024-03-29 13:19:17,157][00497] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-03-29 13:19:17,160][00497] Updated weights for policy 0, policy_version 10489 (0.0027) [2024-03-29 13:19:18,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 171933696. Throughput: 0: 42298.6. Samples: 54074460. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:19:18,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 13:19:20,313][00497] Updated weights for policy 0, policy_version 10499 (0.0027) [2024-03-29 13:19:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 172130304. Throughput: 0: 42024.4. Samples: 54329040. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:19:23,841][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:19:25,069][00497] Updated weights for policy 0, policy_version 10509 (0.0021) [2024-03-29 13:19:28,403][00497] Updated weights for policy 0, policy_version 10519 (0.0029) [2024-03-29 13:19:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 172359680. Throughput: 0: 42128.9. Samples: 54574440. Policy #0 lag: (min: 0.0, avg: 20.2, max: 44.0) [2024-03-29 13:19:28,840][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 13:19:32,635][00497] Updated weights for policy 0, policy_version 10529 (0.0021) [2024-03-29 13:19:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 172556288. Throughput: 0: 42081.9. Samples: 54703040. Policy #0 lag: (min: 0.0, avg: 19.6, max: 43.0) [2024-03-29 13:19:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:19:36,060][00497] Updated weights for policy 0, policy_version 10539 (0.0019) [2024-03-29 13:19:38,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 172736512. Throughput: 0: 41692.9. Samples: 54954720. Policy #0 lag: (min: 0.0, avg: 19.6, max: 43.0) [2024-03-29 13:19:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:19:40,816][00497] Updated weights for policy 0, policy_version 10549 (0.0018) [2024-03-29 13:19:43,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.8, 300 sec: 42043.0). Total num frames: 172982272. Throughput: 0: 41987.0. Samples: 55200240. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:19:43,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:19:44,021][00497] Updated weights for policy 0, policy_version 10559 (0.0021) [2024-03-29 13:19:48,035][00476] Signal inference workers to stop experience collection... (2000 times) [2024-03-29 13:19:48,108][00497] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-03-29 13:19:48,195][00476] Signal inference workers to resume experience collection... (2000 times) [2024-03-29 13:19:48,195][00497] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-03-29 13:19:48,199][00497] Updated weights for policy 0, policy_version 10569 (0.0021) [2024-03-29 13:19:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 173178880. Throughput: 0: 41723.1. Samples: 55328720. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:19:48,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:19:51,521][00497] Updated weights for policy 0, policy_version 10579 (0.0026) [2024-03-29 13:19:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.1, 300 sec: 42043.0). Total num frames: 173375488. Throughput: 0: 41569.0. Samples: 55576740. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:19:53,842][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:19:56,584][00497] Updated weights for policy 0, policy_version 10589 (0.0023) [2024-03-29 13:19:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 173621248. Throughput: 0: 42147.5. Samples: 55834740. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:19:58,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:19:59,619][00497] Updated weights for policy 0, policy_version 10599 (0.0019) [2024-03-29 13:20:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.0, 300 sec: 42043.0). Total num frames: 173801472. Throughput: 0: 41861.6. Samples: 55958240. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:20:03,841][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 13:20:04,029][00497] Updated weights for policy 0, policy_version 10609 (0.0022) [2024-03-29 13:20:07,144][00497] Updated weights for policy 0, policy_version 10619 (0.0027) [2024-03-29 13:20:08,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.4, 300 sec: 42098.6). Total num frames: 173998080. Throughput: 0: 41501.8. Samples: 56196620. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:20:08,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:20:12,431][00497] Updated weights for policy 0, policy_version 10629 (0.0033) [2024-03-29 13:20:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 174211072. Throughput: 0: 41786.7. Samples: 56454840. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 13:20:13,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:20:15,719][00497] Updated weights for policy 0, policy_version 10639 (0.0025) [2024-03-29 13:20:16,644][00476] Signal inference workers to stop experience collection... (2050 times) [2024-03-29 13:20:16,678][00497] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-03-29 13:20:16,826][00476] Signal inference workers to resume experience collection... (2050 times) [2024-03-29 13:20:16,826][00497] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-03-29 13:20:18,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 174424064. Throughput: 0: 41547.9. Samples: 56572700. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 13:20:18,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:20:20,026][00497] Updated weights for policy 0, policy_version 10649 (0.0018) [2024-03-29 13:20:23,194][00497] Updated weights for policy 0, policy_version 10659 (0.0017) [2024-03-29 13:20:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 174653440. Throughput: 0: 41555.4. Samples: 56824720. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 13:20:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:20:28,144][00497] Updated weights for policy 0, policy_version 10669 (0.0024) [2024-03-29 13:20:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 174833664. Throughput: 0: 41805.0. Samples: 57081460. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 13:20:28,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:20:31,430][00497] Updated weights for policy 0, policy_version 10679 (0.0023) [2024-03-29 13:20:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 175063040. Throughput: 0: 41351.1. Samples: 57189520. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 13:20:33,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 13:20:35,652][00497] Updated weights for policy 0, policy_version 10689 (0.0019) [2024-03-29 13:20:38,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 175276032. Throughput: 0: 41740.5. Samples: 57455060. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:20:38,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:20:39,080][00497] Updated weights for policy 0, policy_version 10699 (0.0025) [2024-03-29 13:20:43,839][00126] Fps is (10 sec: 37682.9, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 175439872. Throughput: 0: 41524.4. Samples: 57703340. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:20:43,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:20:43,871][00497] Updated weights for policy 0, policy_version 10709 (0.0024) [2024-03-29 13:20:47,195][00497] Updated weights for policy 0, policy_version 10719 (0.0020) [2024-03-29 13:20:48,391][00476] Signal inference workers to stop experience collection... (2100 times) [2024-03-29 13:20:48,416][00497] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-03-29 13:20:48,611][00476] Signal inference workers to resume experience collection... (2100 times) [2024-03-29 13:20:48,612][00497] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-03-29 13:20:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 175685632. Throughput: 0: 41250.3. Samples: 57814500. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 13:20:48,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:20:51,459][00497] Updated weights for policy 0, policy_version 10729 (0.0031) [2024-03-29 13:20:53,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 42098.5). Total num frames: 175898624. Throughput: 0: 41924.0. Samples: 58083200. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:20:53,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:20:54,611][00497] Updated weights for policy 0, policy_version 10739 (0.0022) [2024-03-29 13:20:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 42043.0). Total num frames: 176095232. Throughput: 0: 41726.2. Samples: 58332520. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:20:58,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:20:59,546][00497] Updated weights for policy 0, policy_version 10749 (0.0026) [2024-03-29 13:21:02,759][00497] Updated weights for policy 0, policy_version 10759 (0.0025) [2024-03-29 13:21:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 176308224. Throughput: 0: 41852.5. Samples: 58456060. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:21:03,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:21:03,925][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010762_176324608.pth... [2024-03-29 13:21:04,250][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010148_166264832.pth [2024-03-29 13:21:07,085][00497] Updated weights for policy 0, policy_version 10769 (0.0025) [2024-03-29 13:21:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 176521216. Throughput: 0: 42033.4. Samples: 58716220. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:21:08,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 13:21:10,315][00497] Updated weights for policy 0, policy_version 10779 (0.0026) [2024-03-29 13:21:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 176717824. Throughput: 0: 41772.1. Samples: 58961200. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:21:13,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 13:21:15,318][00497] Updated weights for policy 0, policy_version 10789 (0.0020) [2024-03-29 13:21:18,750][00497] Updated weights for policy 0, policy_version 10799 (0.0033) [2024-03-29 13:21:18,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 176930816. Throughput: 0: 42064.3. Samples: 59082420. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:21:18,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:21:19,786][00476] Signal inference workers to stop experience collection... (2150 times) [2024-03-29 13:21:19,807][00497] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-03-29 13:21:19,959][00476] Signal inference workers to resume experience collection... (2150 times) [2024-03-29 13:21:19,960][00497] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-03-29 13:21:23,375][00497] Updated weights for policy 0, policy_version 10809 (0.0023) [2024-03-29 13:21:23,839][00126] Fps is (10 sec: 39321.0, 60 sec: 40960.0, 300 sec: 41820.8). Total num frames: 177111040. Throughput: 0: 41432.4. Samples: 59319520. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:21:23,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:21:26,497][00497] Updated weights for policy 0, policy_version 10819 (0.0027) [2024-03-29 13:21:28,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 177324032. Throughput: 0: 41455.6. Samples: 59568840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:21:28,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:21:30,931][00497] Updated weights for policy 0, policy_version 10829 (0.0027) [2024-03-29 13:21:33,839][00126] Fps is (10 sec: 45875.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 177569792. Throughput: 0: 42041.9. Samples: 59706380. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:21:33,841][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:21:34,147][00497] Updated weights for policy 0, policy_version 10839 (0.0024) [2024-03-29 13:21:38,784][00497] Updated weights for policy 0, policy_version 10849 (0.0026) [2024-03-29 13:21:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41765.7). Total num frames: 177750016. Throughput: 0: 41512.8. Samples: 59951280. Policy #0 lag: (min: 2.0, avg: 22.6, max: 43.0) [2024-03-29 13:21:38,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 13:21:42,163][00497] Updated weights for policy 0, policy_version 10859 (0.0029) [2024-03-29 13:21:43,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 177963008. Throughput: 0: 41502.6. Samples: 60200140. Policy #0 lag: (min: 2.0, avg: 22.6, max: 43.0) [2024-03-29 13:21:43,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:21:46,622][00497] Updated weights for policy 0, policy_version 10869 (0.0019) [2024-03-29 13:21:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41779.3, 300 sec: 41820.8). Total num frames: 178192384. Throughput: 0: 41791.6. Samples: 60336680. Policy #0 lag: (min: 2.0, avg: 19.9, max: 42.0) [2024-03-29 13:21:48,840][00126] Avg episode reward: [(0, '0.282')] [2024-03-29 13:21:49,962][00497] Updated weights for policy 0, policy_version 10879 (0.0025) [2024-03-29 13:21:50,587][00476] Signal inference workers to stop experience collection... (2200 times) [2024-03-29 13:21:50,625][00497] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-03-29 13:21:50,807][00476] Signal inference workers to resume experience collection... (2200 times) [2024-03-29 13:21:50,807][00497] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-03-29 13:21:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 178372608. Throughput: 0: 41249.2. Samples: 60572440. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:21:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:21:54,547][00497] Updated weights for policy 0, policy_version 10889 (0.0017) [2024-03-29 13:21:57,552][00497] Updated weights for policy 0, policy_version 10899 (0.0025) [2024-03-29 13:21:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 178601984. Throughput: 0: 41369.2. Samples: 60822820. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:21:58,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:22:02,116][00497] Updated weights for policy 0, policy_version 10909 (0.0025) [2024-03-29 13:22:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 178814976. Throughput: 0: 41949.9. Samples: 60970160. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:22:03,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:22:05,201][00497] Updated weights for policy 0, policy_version 10919 (0.0021) [2024-03-29 13:22:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 179011584. Throughput: 0: 41921.0. Samples: 61205960. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:22:08,841][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:22:09,859][00497] Updated weights for policy 0, policy_version 10929 (0.0028) [2024-03-29 13:22:12,928][00497] Updated weights for policy 0, policy_version 10939 (0.0021) [2024-03-29 13:22:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 179240960. Throughput: 0: 41991.5. Samples: 61458460. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:22:13,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 13:22:17,484][00497] Updated weights for policy 0, policy_version 10949 (0.0017) [2024-03-29 13:22:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 179453952. Throughput: 0: 42253.8. Samples: 61607800. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:22:18,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:22:20,708][00497] Updated weights for policy 0, policy_version 10959 (0.0017) [2024-03-29 13:22:23,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 179650560. Throughput: 0: 42301.8. Samples: 61854860. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 13:22:23,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 13:22:25,277][00476] Signal inference workers to stop experience collection... (2250 times) [2024-03-29 13:22:25,322][00497] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-03-29 13:22:25,352][00476] Signal inference workers to resume experience collection... (2250 times) [2024-03-29 13:22:25,356][00497] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-03-29 13:22:25,363][00497] Updated weights for policy 0, policy_version 10969 (0.0024) [2024-03-29 13:22:28,676][00497] Updated weights for policy 0, policy_version 10979 (0.0022) [2024-03-29 13:22:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 179879936. Throughput: 0: 42129.5. Samples: 62095960. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 13:22:28,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:22:33,031][00497] Updated weights for policy 0, policy_version 10989 (0.0018) [2024-03-29 13:22:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 180076544. Throughput: 0: 42311.2. Samples: 62240680. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 13:22:33,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:22:36,323][00497] Updated weights for policy 0, policy_version 10999 (0.0022) [2024-03-29 13:22:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 180305920. Throughput: 0: 42471.6. Samples: 62483660. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:22:38,841][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:22:41,001][00497] Updated weights for policy 0, policy_version 11009 (0.0024) [2024-03-29 13:22:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 180502528. Throughput: 0: 42270.7. Samples: 62725000. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:22:43,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:22:44,494][00497] Updated weights for policy 0, policy_version 11019 (0.0024) [2024-03-29 13:22:48,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 180682752. Throughput: 0: 41932.0. Samples: 62857100. Policy #0 lag: (min: 0.0, avg: 18.7, max: 40.0) [2024-03-29 13:22:48,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 13:22:48,918][00497] Updated weights for policy 0, policy_version 11029 (0.0023) [2024-03-29 13:22:51,976][00497] Updated weights for policy 0, policy_version 11039 (0.0020) [2024-03-29 13:22:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 180944896. Throughput: 0: 42232.0. Samples: 63106400. Policy #0 lag: (min: 0.0, avg: 18.7, max: 40.0) [2024-03-29 13:22:53,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:22:56,348][00497] Updated weights for policy 0, policy_version 11049 (0.0019) [2024-03-29 13:22:56,881][00476] Signal inference workers to stop experience collection... (2300 times) [2024-03-29 13:22:56,939][00497] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-03-29 13:22:56,974][00476] Signal inference workers to resume experience collection... (2300 times) [2024-03-29 13:22:56,976][00497] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-03-29 13:22:58,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 181141504. Throughput: 0: 42344.6. Samples: 63363960. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 13:22:58,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:22:59,751][00497] Updated weights for policy 0, policy_version 11059 (0.0035) [2024-03-29 13:23:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 181338112. Throughput: 0: 42138.1. Samples: 63504020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 13:23:03,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:23:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011068_181338112.pth... [2024-03-29 13:23:04,169][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010457_171327488.pth [2024-03-29 13:23:04,440][00497] Updated weights for policy 0, policy_version 11069 (0.0019) [2024-03-29 13:23:07,451][00497] Updated weights for policy 0, policy_version 11079 (0.0030) [2024-03-29 13:23:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 181583872. Throughput: 0: 42052.8. Samples: 63747240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 13:23:08,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:23:11,688][00497] Updated weights for policy 0, policy_version 11089 (0.0024) [2024-03-29 13:23:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 181780480. Throughput: 0: 42432.8. Samples: 64005440. Policy #0 lag: (min: 2.0, avg: 21.6, max: 43.0) [2024-03-29 13:23:13,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 13:23:15,131][00497] Updated weights for policy 0, policy_version 11099 (0.0019) [2024-03-29 13:23:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 181977088. Throughput: 0: 41979.4. Samples: 64129760. Policy #0 lag: (min: 2.0, avg: 21.6, max: 43.0) [2024-03-29 13:23:18,841][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:23:19,806][00497] Updated weights for policy 0, policy_version 11109 (0.0028) [2024-03-29 13:23:23,023][00497] Updated weights for policy 0, policy_version 11119 (0.0026) [2024-03-29 13:23:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 182206464. Throughput: 0: 42123.0. Samples: 64379200. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:23:23,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:23:27,367][00497] Updated weights for policy 0, policy_version 11129 (0.0032) [2024-03-29 13:23:28,240][00476] Signal inference workers to stop experience collection... (2350 times) [2024-03-29 13:23:28,322][00497] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-03-29 13:23:28,326][00476] Signal inference workers to resume experience collection... (2350 times) [2024-03-29 13:23:28,348][00497] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-03-29 13:23:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 182403072. Throughput: 0: 42555.9. Samples: 64640020. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:23:28,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:23:30,886][00497] Updated weights for policy 0, policy_version 11139 (0.0022) [2024-03-29 13:23:33,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 182583296. Throughput: 0: 42199.6. Samples: 64756080. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:23:33,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:23:35,256][00497] Updated weights for policy 0, policy_version 11149 (0.0022) [2024-03-29 13:23:38,629][00497] Updated weights for policy 0, policy_version 11159 (0.0044) [2024-03-29 13:23:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 41876.5). Total num frames: 182829056. Throughput: 0: 42362.6. Samples: 65012720. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:23:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:23:43,107][00497] Updated weights for policy 0, policy_version 11169 (0.0024) [2024-03-29 13:23:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 183025664. Throughput: 0: 42227.5. Samples: 65264200. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 13:23:43,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 13:23:46,762][00497] Updated weights for policy 0, policy_version 11179 (0.0019) [2024-03-29 13:23:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 183222272. Throughput: 0: 41680.9. Samples: 65379660. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 13:23:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:23:51,119][00497] Updated weights for policy 0, policy_version 11189 (0.0027) [2024-03-29 13:23:53,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 183468032. Throughput: 0: 42180.5. Samples: 65645360. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:23:53,841][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:23:54,222][00497] Updated weights for policy 0, policy_version 11199 (0.0026) [2024-03-29 13:23:58,552][00497] Updated weights for policy 0, policy_version 11209 (0.0018) [2024-03-29 13:23:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 183664640. Throughput: 0: 42172.5. Samples: 65903200. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:23:58,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:23:59,493][00476] Signal inference workers to stop experience collection... (2400 times) [2024-03-29 13:23:59,494][00476] Signal inference workers to resume experience collection... (2400 times) [2024-03-29 13:23:59,530][00497] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-03-29 13:23:59,535][00497] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-03-29 13:24:02,083][00497] Updated weights for policy 0, policy_version 11219 (0.0034) [2024-03-29 13:24:03,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 183861248. Throughput: 0: 41998.2. Samples: 66019680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:24:03,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:24:06,711][00497] Updated weights for policy 0, policy_version 11229 (0.0028) [2024-03-29 13:24:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 184090624. Throughput: 0: 42462.3. Samples: 66290000. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:24:08,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:24:09,608][00497] Updated weights for policy 0, policy_version 11239 (0.0028) [2024-03-29 13:24:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 184287232. Throughput: 0: 42143.1. Samples: 66536460. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:24:13,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:24:13,922][00497] Updated weights for policy 0, policy_version 11249 (0.0022) [2024-03-29 13:24:17,328][00497] Updated weights for policy 0, policy_version 11259 (0.0022) [2024-03-29 13:24:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 184516608. Throughput: 0: 42389.3. Samples: 66663600. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:24:18,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:24:21,939][00497] Updated weights for policy 0, policy_version 11269 (0.0026) [2024-03-29 13:24:23,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.4, 300 sec: 41932.0). Total num frames: 184729600. Throughput: 0: 42667.7. Samples: 66932760. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:24:23,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:24:25,099][00497] Updated weights for policy 0, policy_version 11279 (0.0025) [2024-03-29 13:24:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 184926208. Throughput: 0: 42418.7. Samples: 67173040. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:24:28,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 13:24:29,085][00476] Signal inference workers to stop experience collection... (2450 times) [2024-03-29 13:24:29,110][00497] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-03-29 13:24:29,309][00476] Signal inference workers to resume experience collection... (2450 times) [2024-03-29 13:24:29,310][00497] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-03-29 13:24:29,313][00497] Updated weights for policy 0, policy_version 11289 (0.0019) [2024-03-29 13:24:32,808][00497] Updated weights for policy 0, policy_version 11299 (0.0020) [2024-03-29 13:24:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 185155584. Throughput: 0: 42549.8. Samples: 67294400. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:24:33,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:24:37,442][00497] Updated weights for policy 0, policy_version 11309 (0.0026) [2024-03-29 13:24:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 185352192. Throughput: 0: 42551.2. Samples: 67560160. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 13:24:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:24:40,739][00497] Updated weights for policy 0, policy_version 11319 (0.0022) [2024-03-29 13:24:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 185548800. Throughput: 0: 41867.0. Samples: 67787220. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 13:24:43,841][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:24:44,975][00497] Updated weights for policy 0, policy_version 11329 (0.0028) [2024-03-29 13:24:48,605][00497] Updated weights for policy 0, policy_version 11339 (0.0019) [2024-03-29 13:24:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 185778176. Throughput: 0: 42202.7. Samples: 67918800. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:24:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:24:53,072][00497] Updated weights for policy 0, policy_version 11349 (0.0017) [2024-03-29 13:24:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 185974784. Throughput: 0: 42205.3. Samples: 68189240. Policy #0 lag: (min: 0.0, avg: 19.5, max: 43.0) [2024-03-29 13:24:53,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 13:24:56,157][00497] Updated weights for policy 0, policy_version 11359 (0.0023) [2024-03-29 13:24:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 186204160. Throughput: 0: 41819.9. Samples: 68418360. Policy #0 lag: (min: 0.0, avg: 19.5, max: 43.0) [2024-03-29 13:24:58,842][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 13:25:00,404][00497] Updated weights for policy 0, policy_version 11369 (0.0027) [2024-03-29 13:25:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 186417152. Throughput: 0: 42059.4. Samples: 68556280. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:25:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:25:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011378_186417152.pth... [2024-03-29 13:25:04,195][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010762_176324608.pth [2024-03-29 13:25:04,460][00497] Updated weights for policy 0, policy_version 11379 (0.0023) [2024-03-29 13:25:06,393][00476] Signal inference workers to stop experience collection... (2500 times) [2024-03-29 13:25:06,473][00476] Signal inference workers to resume experience collection... (2500 times) [2024-03-29 13:25:06,474][00497] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-03-29 13:25:06,501][00497] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-03-29 13:25:08,839][00126] Fps is (10 sec: 37683.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 186580992. Throughput: 0: 41691.5. Samples: 68808880. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:25:08,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 13:25:08,991][00497] Updated weights for policy 0, policy_version 11389 (0.0029) [2024-03-29 13:25:11,963][00497] Updated weights for policy 0, policy_version 11399 (0.0033) [2024-03-29 13:25:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 186826752. Throughput: 0: 41206.2. Samples: 69027320. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:25:13,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:25:16,670][00497] Updated weights for policy 0, policy_version 11409 (0.0022) [2024-03-29 13:25:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 187006976. Throughput: 0: 41565.4. Samples: 69164840. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:25:18,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:25:20,224][00497] Updated weights for policy 0, policy_version 11419 (0.0029) [2024-03-29 13:25:23,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 187203584. Throughput: 0: 41415.2. Samples: 69423840. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:25:23,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 13:25:25,034][00497] Updated weights for policy 0, policy_version 11429 (0.0019) [2024-03-29 13:25:28,096][00497] Updated weights for policy 0, policy_version 11439 (0.0041) [2024-03-29 13:25:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 187449344. Throughput: 0: 41525.0. Samples: 69655840. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:25:28,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:25:32,598][00497] Updated weights for policy 0, policy_version 11449 (0.0020) [2024-03-29 13:25:33,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 187629568. Throughput: 0: 41355.2. Samples: 69779780. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 13:25:33,840][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 13:25:36,126][00497] Updated weights for policy 0, policy_version 11459 (0.0027) [2024-03-29 13:25:38,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41233.0, 300 sec: 41987.5). Total num frames: 187826176. Throughput: 0: 41033.7. Samples: 70035760. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:25:38,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:25:40,658][00476] Signal inference workers to stop experience collection... (2550 times) [2024-03-29 13:25:40,701][00497] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-03-29 13:25:40,739][00476] Signal inference workers to resume experience collection... (2550 times) [2024-03-29 13:25:40,741][00497] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-03-29 13:25:40,744][00497] Updated weights for policy 0, policy_version 11469 (0.0024) [2024-03-29 13:25:43,834][00497] Updated weights for policy 0, policy_version 11479 (0.0018) [2024-03-29 13:25:43,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 188071936. Throughput: 0: 41313.9. Samples: 70277480. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:25:43,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:25:48,243][00497] Updated weights for policy 0, policy_version 11489 (0.0024) [2024-03-29 13:25:48,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 188252160. Throughput: 0: 40940.6. Samples: 70398600. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 13:25:48,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 13:25:51,997][00497] Updated weights for policy 0, policy_version 11499 (0.0029) [2024-03-29 13:25:53,839][00126] Fps is (10 sec: 37683.9, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 188448768. Throughput: 0: 41037.9. Samples: 70655580. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 13:25:53,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:25:56,593][00497] Updated weights for policy 0, policy_version 11509 (0.0029) [2024-03-29 13:25:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 188694528. Throughput: 0: 41596.4. Samples: 70899160. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:25:58,842][00126] Avg episode reward: [(0, '0.239')] [2024-03-29 13:25:59,768][00497] Updated weights for policy 0, policy_version 11519 (0.0027) [2024-03-29 13:26:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 188874752. Throughput: 0: 41232.9. Samples: 71020320. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:26:03,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:26:03,964][00497] Updated weights for policy 0, policy_version 11529 (0.0026) [2024-03-29 13:26:07,782][00497] Updated weights for policy 0, policy_version 11539 (0.0022) [2024-03-29 13:26:08,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 189087744. Throughput: 0: 41198.6. Samples: 71277780. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 13:26:08,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:26:12,463][00497] Updated weights for policy 0, policy_version 11549 (0.0027) [2024-03-29 13:26:12,667][00476] Signal inference workers to stop experience collection... (2600 times) [2024-03-29 13:26:12,716][00497] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-03-29 13:26:12,757][00476] Signal inference workers to resume experience collection... (2600 times) [2024-03-29 13:26:12,757][00497] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-03-29 13:26:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41932.0). Total num frames: 189300736. Throughput: 0: 41859.6. Samples: 71539520. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 13:26:13,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:26:15,634][00497] Updated weights for policy 0, policy_version 11559 (0.0025) [2024-03-29 13:26:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 189480960. Throughput: 0: 41089.0. Samples: 71628780. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:26:18,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 13:26:19,988][00497] Updated weights for policy 0, policy_version 11569 (0.0024) [2024-03-29 13:26:23,724][00497] Updated weights for policy 0, policy_version 11579 (0.0023) [2024-03-29 13:26:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 189710336. Throughput: 0: 41295.1. Samples: 71894040. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:26:23,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:26:28,201][00497] Updated weights for policy 0, policy_version 11589 (0.0022) [2024-03-29 13:26:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 189906944. Throughput: 0: 41980.2. Samples: 72166580. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 13:26:28,840][00126] Avg episode reward: [(0, '0.254')] [2024-03-29 13:26:31,245][00497] Updated weights for policy 0, policy_version 11599 (0.0022) [2024-03-29 13:26:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 190136320. Throughput: 0: 41327.0. Samples: 72258320. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 13:26:33,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 13:26:35,779][00497] Updated weights for policy 0, policy_version 11609 (0.0028) [2024-03-29 13:26:38,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 190332928. Throughput: 0: 41486.5. Samples: 72522480. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:26:38,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:26:39,312][00497] Updated weights for policy 0, policy_version 11619 (0.0024) [2024-03-29 13:26:43,839][00126] Fps is (10 sec: 36045.7, 60 sec: 40414.0, 300 sec: 41709.8). Total num frames: 190496768. Throughput: 0: 41476.1. Samples: 72765580. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:26:43,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:26:44,231][00497] Updated weights for policy 0, policy_version 11629 (0.0030) [2024-03-29 13:26:45,180][00476] Signal inference workers to stop experience collection... (2650 times) [2024-03-29 13:26:45,232][00497] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-03-29 13:26:45,267][00476] Signal inference workers to resume experience collection... (2650 times) [2024-03-29 13:26:45,273][00497] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-03-29 13:26:47,195][00497] Updated weights for policy 0, policy_version 11639 (0.0024) [2024-03-29 13:26:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 190742528. Throughput: 0: 41304.8. Samples: 72879040. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:26:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 13:26:51,662][00497] Updated weights for policy 0, policy_version 11649 (0.0020) [2024-03-29 13:26:53,839][00126] Fps is (10 sec: 45874.8, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 190955520. Throughput: 0: 41198.2. Samples: 73131700. Policy #0 lag: (min: 1.0, avg: 20.4, max: 40.0) [2024-03-29 13:26:53,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:26:55,595][00497] Updated weights for policy 0, policy_version 11659 (0.0017) [2024-03-29 13:26:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40686.9, 300 sec: 41765.3). Total num frames: 191135744. Throughput: 0: 41060.4. Samples: 73387240. Policy #0 lag: (min: 1.0, avg: 20.4, max: 40.0) [2024-03-29 13:26:58,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:27:00,283][00497] Updated weights for policy 0, policy_version 11669 (0.0024) [2024-03-29 13:27:03,172][00497] Updated weights for policy 0, policy_version 11679 (0.0019) [2024-03-29 13:27:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 191365120. Throughput: 0: 41531.5. Samples: 73497700. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:27:03,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:27:04,151][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011681_191381504.pth... [2024-03-29 13:27:04,462][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011068_181338112.pth [2024-03-29 13:27:07,674][00497] Updated weights for policy 0, policy_version 11689 (0.0035) [2024-03-29 13:27:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 191561728. Throughput: 0: 41108.6. Samples: 73743920. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:27:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:27:11,323][00497] Updated weights for policy 0, policy_version 11699 (0.0023) [2024-03-29 13:27:13,839][00126] Fps is (10 sec: 37683.5, 60 sec: 40687.0, 300 sec: 41654.2). Total num frames: 191741952. Throughput: 0: 40875.1. Samples: 74005960. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:27:13,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:27:15,911][00497] Updated weights for policy 0, policy_version 11709 (0.0022) [2024-03-29 13:27:17,313][00476] Signal inference workers to stop experience collection... (2700 times) [2024-03-29 13:27:17,313][00476] Signal inference workers to resume experience collection... (2700 times) [2024-03-29 13:27:17,353][00497] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-03-29 13:27:17,353][00497] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-03-29 13:27:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 191987712. Throughput: 0: 41723.6. Samples: 74135880. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:27:18,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:27:18,863][00497] Updated weights for policy 0, policy_version 11719 (0.0023) [2024-03-29 13:27:23,055][00497] Updated weights for policy 0, policy_version 11729 (0.0017) [2024-03-29 13:27:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 192184320. Throughput: 0: 41125.3. Samples: 74373120. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:27:23,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:27:26,913][00497] Updated weights for policy 0, policy_version 11739 (0.0023) [2024-03-29 13:27:28,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41232.9, 300 sec: 41709.7). Total num frames: 192380928. Throughput: 0: 41469.6. Samples: 74631720. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:27:28,840][00126] Avg episode reward: [(0, '0.283')] [2024-03-29 13:27:31,499][00497] Updated weights for policy 0, policy_version 11749 (0.0021) [2024-03-29 13:27:33,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 192610304. Throughput: 0: 41967.9. Samples: 74767600. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:27:33,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:27:34,452][00497] Updated weights for policy 0, policy_version 11759 (0.0020) [2024-03-29 13:27:38,632][00497] Updated weights for policy 0, policy_version 11769 (0.0026) [2024-03-29 13:27:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 192823296. Throughput: 0: 41760.8. Samples: 75010940. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:27:38,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:27:42,416][00497] Updated weights for policy 0, policy_version 11779 (0.0022) [2024-03-29 13:27:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 193019904. Throughput: 0: 41521.3. Samples: 75255700. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 13:27:43,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:27:47,081][00497] Updated weights for policy 0, policy_version 11789 (0.0024) [2024-03-29 13:27:48,277][00476] Signal inference workers to stop experience collection... (2750 times) [2024-03-29 13:27:48,322][00497] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-03-29 13:27:48,488][00476] Signal inference workers to resume experience collection... (2750 times) [2024-03-29 13:27:48,489][00497] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-03-29 13:27:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 193249280. Throughput: 0: 42128.9. Samples: 75393500. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:27:48,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:27:49,959][00497] Updated weights for policy 0, policy_version 11799 (0.0024) [2024-03-29 13:27:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 193462272. Throughput: 0: 42120.3. Samples: 75639340. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:27:53,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 13:27:54,029][00497] Updated weights for policy 0, policy_version 11809 (0.0023) [2024-03-29 13:27:58,025][00497] Updated weights for policy 0, policy_version 11819 (0.0024) [2024-03-29 13:27:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 193675264. Throughput: 0: 42046.1. Samples: 75898040. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:27:58,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:28:02,506][00497] Updated weights for policy 0, policy_version 11829 (0.0018) [2024-03-29 13:28:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 193871872. Throughput: 0: 42170.3. Samples: 76033540. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:28:03,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 13:28:05,352][00497] Updated weights for policy 0, policy_version 11839 (0.0036) [2024-03-29 13:28:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 194101248. Throughput: 0: 42291.6. Samples: 76276240. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:28:08,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 13:28:09,516][00497] Updated weights for policy 0, policy_version 11849 (0.0020) [2024-03-29 13:28:13,571][00497] Updated weights for policy 0, policy_version 11859 (0.0028) [2024-03-29 13:28:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.4, 300 sec: 41820.9). Total num frames: 194314240. Throughput: 0: 42234.3. Samples: 76532260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:28:13,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:28:18,163][00497] Updated weights for policy 0, policy_version 11869 (0.0028) [2024-03-29 13:28:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 194494464. Throughput: 0: 42115.2. Samples: 76662780. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:28:18,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 13:28:21,151][00497] Updated weights for policy 0, policy_version 11879 (0.0019) [2024-03-29 13:28:23,141][00476] Signal inference workers to stop experience collection... (2800 times) [2024-03-29 13:28:23,142][00476] Signal inference workers to resume experience collection... (2800 times) [2024-03-29 13:28:23,181][00497] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-03-29 13:28:23,182][00497] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-03-29 13:28:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 194723840. Throughput: 0: 41806.7. Samples: 76892240. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:28:23,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:28:25,420][00497] Updated weights for policy 0, policy_version 11889 (0.0019) [2024-03-29 13:28:28,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41820.8). Total num frames: 194920448. Throughput: 0: 42244.5. Samples: 77156700. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:28:28,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:28:29,107][00497] Updated weights for policy 0, policy_version 11899 (0.0018) [2024-03-29 13:28:33,552][00497] Updated weights for policy 0, policy_version 11909 (0.0022) [2024-03-29 13:28:33,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 195117056. Throughput: 0: 42148.8. Samples: 77290200. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:28:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:28:36,559][00497] Updated weights for policy 0, policy_version 11919 (0.0027) [2024-03-29 13:28:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 195362816. Throughput: 0: 42038.7. Samples: 77531080. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 13:28:38,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:28:40,843][00497] Updated weights for policy 0, policy_version 11929 (0.0023) [2024-03-29 13:28:43,839][00126] Fps is (10 sec: 45876.3, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 195575808. Throughput: 0: 42132.6. Samples: 77794000. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 13:28:43,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:28:44,654][00497] Updated weights for policy 0, policy_version 11939 (0.0039) [2024-03-29 13:28:48,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 195739648. Throughput: 0: 41901.2. Samples: 77919100. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:28:48,843][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:28:49,241][00497] Updated weights for policy 0, policy_version 11949 (0.0025) [2024-03-29 13:28:52,237][00497] Updated weights for policy 0, policy_version 11959 (0.0026) [2024-03-29 13:28:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 195985408. Throughput: 0: 41691.6. Samples: 78152360. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:28:53,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:28:56,392][00497] Updated weights for policy 0, policy_version 11969 (0.0022) [2024-03-29 13:28:58,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 196198400. Throughput: 0: 42104.0. Samples: 78426940. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:28:58,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 13:29:00,138][00476] Signal inference workers to stop experience collection... (2850 times) [2024-03-29 13:29:00,202][00497] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-03-29 13:29:00,301][00476] Signal inference workers to resume experience collection... (2850 times) [2024-03-29 13:29:00,301][00497] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-03-29 13:29:00,305][00497] Updated weights for policy 0, policy_version 11979 (0.0019) [2024-03-29 13:29:03,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 196395008. Throughput: 0: 41913.7. Samples: 78548900. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 13:29:03,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:29:03,865][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011987_196395008.pth... [2024-03-29 13:29:04,160][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011378_186417152.pth [2024-03-29 13:29:04,706][00497] Updated weights for policy 0, policy_version 11989 (0.0023) [2024-03-29 13:29:07,697][00497] Updated weights for policy 0, policy_version 11999 (0.0027) [2024-03-29 13:29:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 196624384. Throughput: 0: 42240.5. Samples: 78793060. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 13:29:08,840][00126] Avg episode reward: [(0, '0.292')] [2024-03-29 13:29:12,000][00497] Updated weights for policy 0, policy_version 12009 (0.0021) [2024-03-29 13:29:13,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 196837376. Throughput: 0: 42290.6. Samples: 79059780. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:29:13,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:29:15,676][00497] Updated weights for policy 0, policy_version 12019 (0.0024) [2024-03-29 13:29:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 197017600. Throughput: 0: 42152.2. Samples: 79187040. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:29:18,841][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:29:20,248][00497] Updated weights for policy 0, policy_version 12029 (0.0021) [2024-03-29 13:29:23,351][00497] Updated weights for policy 0, policy_version 12039 (0.0031) [2024-03-29 13:29:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 197263360. Throughput: 0: 42286.2. Samples: 79433960. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:29:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:29:27,466][00497] Updated weights for policy 0, policy_version 12049 (0.0018) [2024-03-29 13:29:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 197459968. Throughput: 0: 42217.3. Samples: 79693780. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:29:28,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 13:29:31,154][00497] Updated weights for policy 0, policy_version 12059 (0.0018) [2024-03-29 13:29:31,462][00476] Signal inference workers to stop experience collection... (2900 times) [2024-03-29 13:29:31,466][00476] Signal inference workers to resume experience collection... (2900 times) [2024-03-29 13:29:31,509][00497] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-03-29 13:29:31,513][00497] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-03-29 13:29:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 197640192. Throughput: 0: 42332.6. Samples: 79824060. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:29:33,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 13:29:35,743][00497] Updated weights for policy 0, policy_version 12069 (0.0027) [2024-03-29 13:29:38,760][00497] Updated weights for policy 0, policy_version 12079 (0.0024) [2024-03-29 13:29:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 197902336. Throughput: 0: 42737.3. Samples: 80075540. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:29:38,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:29:43,047][00497] Updated weights for policy 0, policy_version 12089 (0.0019) [2024-03-29 13:29:43,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 198098944. Throughput: 0: 42202.3. Samples: 80326040. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:29:43,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:29:46,888][00497] Updated weights for policy 0, policy_version 12099 (0.0026) [2024-03-29 13:29:48,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42598.6, 300 sec: 41765.3). Total num frames: 198295552. Throughput: 0: 42096.7. Samples: 80443240. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:29:48,841][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 13:29:51,405][00497] Updated weights for policy 0, policy_version 12109 (0.0028) [2024-03-29 13:29:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 198524928. Throughput: 0: 42311.6. Samples: 80697080. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:29:53,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:29:54,428][00497] Updated weights for policy 0, policy_version 12119 (0.0025) [2024-03-29 13:29:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 198705152. Throughput: 0: 41761.9. Samples: 80939060. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:29:58,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:29:58,969][00497] Updated weights for policy 0, policy_version 12129 (0.0040) [2024-03-29 13:30:02,829][00497] Updated weights for policy 0, policy_version 12139 (0.0024) [2024-03-29 13:30:03,839][00126] Fps is (10 sec: 39320.8, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 198918144. Throughput: 0: 41765.6. Samples: 81066500. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:30:03,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:30:05,166][00476] Signal inference workers to stop experience collection... (2950 times) [2024-03-29 13:30:05,200][00497] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-03-29 13:30:05,348][00476] Signal inference workers to resume experience collection... (2950 times) [2024-03-29 13:30:05,348][00497] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-03-29 13:30:07,139][00497] Updated weights for policy 0, policy_version 12149 (0.0018) [2024-03-29 13:30:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 199131136. Throughput: 0: 42181.4. Samples: 81332120. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:30:08,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 13:30:10,231][00497] Updated weights for policy 0, policy_version 12159 (0.0032) [2024-03-29 13:30:13,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 199360512. Throughput: 0: 41768.3. Samples: 81573360. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 13:30:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:30:14,561][00497] Updated weights for policy 0, policy_version 12169 (0.0024) [2024-03-29 13:30:18,434][00497] Updated weights for policy 0, policy_version 12179 (0.0027) [2024-03-29 13:30:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 199557120. Throughput: 0: 41647.1. Samples: 81698180. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 13:30:18,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:30:22,389][00497] Updated weights for policy 0, policy_version 12189 (0.0032) [2024-03-29 13:30:23,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 199770112. Throughput: 0: 42179.6. Samples: 81973620. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 13:30:23,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:30:25,603][00497] Updated weights for policy 0, policy_version 12199 (0.0028) [2024-03-29 13:30:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 199983104. Throughput: 0: 41805.3. Samples: 82207280. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 13:30:28,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:30:29,832][00497] Updated weights for policy 0, policy_version 12209 (0.0021) [2024-03-29 13:30:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 200179712. Throughput: 0: 42018.2. Samples: 82334060. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 13:30:33,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:30:33,855][00497] Updated weights for policy 0, policy_version 12219 (0.0023) [2024-03-29 13:30:38,034][00497] Updated weights for policy 0, policy_version 12229 (0.0023) [2024-03-29 13:30:38,101][00476] Signal inference workers to stop experience collection... (3000 times) [2024-03-29 13:30:38,130][00497] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-03-29 13:30:38,283][00476] Signal inference workers to resume experience collection... (3000 times) [2024-03-29 13:30:38,283][00497] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-03-29 13:30:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 200409088. Throughput: 0: 42454.2. Samples: 82607520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 13:30:38,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:30:41,097][00497] Updated weights for policy 0, policy_version 12239 (0.0024) [2024-03-29 13:30:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 200622080. Throughput: 0: 42328.8. Samples: 82843860. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 13:30:43,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 13:30:45,533][00497] Updated weights for policy 0, policy_version 12249 (0.0022) [2024-03-29 13:30:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 200818688. Throughput: 0: 42350.3. Samples: 82972260. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:30:48,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:30:49,415][00497] Updated weights for policy 0, policy_version 12259 (0.0020) [2024-03-29 13:30:53,534][00497] Updated weights for policy 0, policy_version 12269 (0.0024) [2024-03-29 13:30:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 201031680. Throughput: 0: 42397.7. Samples: 83240020. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:30:53,840][00126] Avg episode reward: [(0, '0.267')] [2024-03-29 13:30:56,644][00497] Updated weights for policy 0, policy_version 12279 (0.0027) [2024-03-29 13:30:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 201244672. Throughput: 0: 42151.6. Samples: 83470180. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 13:30:58,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:31:01,173][00497] Updated weights for policy 0, policy_version 12289 (0.0025) [2024-03-29 13:31:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.5, 300 sec: 41931.9). Total num frames: 201457664. Throughput: 0: 42322.7. Samples: 83602700. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 13:31:03,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 13:31:04,058][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012297_201474048.pth... [2024-03-29 13:31:04,375][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011681_191381504.pth [2024-03-29 13:31:05,100][00497] Updated weights for policy 0, policy_version 12299 (0.0031) [2024-03-29 13:31:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 201637888. Throughput: 0: 41938.2. Samples: 83860840. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 13:31:08,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:31:09,296][00497] Updated weights for policy 0, policy_version 12309 (0.0022) [2024-03-29 13:31:10,964][00476] Signal inference workers to stop experience collection... (3050 times) [2024-03-29 13:31:11,033][00497] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-03-29 13:31:11,039][00476] Signal inference workers to resume experience collection... (3050 times) [2024-03-29 13:31:11,054][00497] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-03-29 13:31:12,579][00497] Updated weights for policy 0, policy_version 12319 (0.0024) [2024-03-29 13:31:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 201883648. Throughput: 0: 41881.3. Samples: 84091940. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 13:31:13,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:31:16,747][00497] Updated weights for policy 0, policy_version 12329 (0.0026) [2024-03-29 13:31:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 202080256. Throughput: 0: 42098.6. Samples: 84228500. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 13:31:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:31:20,650][00497] Updated weights for policy 0, policy_version 12339 (0.0022) [2024-03-29 13:31:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 202276864. Throughput: 0: 41964.9. Samples: 84495940. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 13:31:23,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:31:24,586][00497] Updated weights for policy 0, policy_version 12349 (0.0018) [2024-03-29 13:31:27,901][00497] Updated weights for policy 0, policy_version 12359 (0.0028) [2024-03-29 13:31:28,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 202522624. Throughput: 0: 41840.8. Samples: 84726700. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 13:31:28,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:31:32,277][00497] Updated weights for policy 0, policy_version 12369 (0.0028) [2024-03-29 13:31:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 202719232. Throughput: 0: 41932.9. Samples: 84859240. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 13:31:33,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:31:36,345][00497] Updated weights for policy 0, policy_version 12379 (0.0033) [2024-03-29 13:31:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 202915840. Throughput: 0: 41740.9. Samples: 85118360. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:31:38,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 13:31:40,346][00497] Updated weights for policy 0, policy_version 12389 (0.0030) [2024-03-29 13:31:41,047][00476] Signal inference workers to stop experience collection... (3100 times) [2024-03-29 13:31:41,080][00497] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-03-29 13:31:41,258][00476] Signal inference workers to resume experience collection... (3100 times) [2024-03-29 13:31:41,259][00497] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-03-29 13:31:43,613][00497] Updated weights for policy 0, policy_version 12399 (0.0026) [2024-03-29 13:31:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 203145216. Throughput: 0: 41764.0. Samples: 85349560. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:31:43,840][00126] Avg episode reward: [(0, '0.241')] [2024-03-29 13:31:47,912][00497] Updated weights for policy 0, policy_version 12409 (0.0017) [2024-03-29 13:31:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 203358208. Throughput: 0: 41870.7. Samples: 85486880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 13:31:48,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:31:52,124][00497] Updated weights for policy 0, policy_version 12419 (0.0020) [2024-03-29 13:31:53,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 203538432. Throughput: 0: 41912.0. Samples: 85746880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 13:31:53,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 13:31:56,022][00497] Updated weights for policy 0, policy_version 12429 (0.0025) [2024-03-29 13:31:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 203784192. Throughput: 0: 41998.7. Samples: 85981880. Policy #0 lag: (min: 1.0, avg: 17.6, max: 41.0) [2024-03-29 13:31:58,841][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:31:59,215][00497] Updated weights for policy 0, policy_version 12439 (0.0021) [2024-03-29 13:32:03,604][00497] Updated weights for policy 0, policy_version 12449 (0.0020) [2024-03-29 13:32:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 203964416. Throughput: 0: 41916.5. Samples: 86114740. Policy #0 lag: (min: 1.0, avg: 17.6, max: 41.0) [2024-03-29 13:32:03,840][00126] Avg episode reward: [(0, '0.272')] [2024-03-29 13:32:07,726][00497] Updated weights for policy 0, policy_version 12459 (0.0032) [2024-03-29 13:32:08,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 204161024. Throughput: 0: 41694.3. Samples: 86372180. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 13:32:08,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:32:11,608][00497] Updated weights for policy 0, policy_version 12469 (0.0023) [2024-03-29 13:32:13,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 204406784. Throughput: 0: 42054.7. Samples: 86619160. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 13:32:13,840][00126] Avg episode reward: [(0, '0.287')] [2024-03-29 13:32:14,888][00497] Updated weights for policy 0, policy_version 12479 (0.0019) [2024-03-29 13:32:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 204603392. Throughput: 0: 41912.5. Samples: 86745300. Policy #0 lag: (min: 1.0, avg: 21.9, max: 41.0) [2024-03-29 13:32:18,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:32:19,062][00497] Updated weights for policy 0, policy_version 12489 (0.0021) [2024-03-29 13:32:19,137][00476] Signal inference workers to stop experience collection... (3150 times) [2024-03-29 13:32:19,189][00497] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-03-29 13:32:19,322][00476] Signal inference workers to resume experience collection... (3150 times) [2024-03-29 13:32:19,322][00497] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-03-29 13:32:23,284][00497] Updated weights for policy 0, policy_version 12499 (0.0026) [2024-03-29 13:32:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 204800000. Throughput: 0: 41914.1. Samples: 87004500. Policy #0 lag: (min: 1.0, avg: 21.9, max: 41.0) [2024-03-29 13:32:23,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:32:26,885][00497] Updated weights for policy 0, policy_version 12509 (0.0028) [2024-03-29 13:32:28,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 205029376. Throughput: 0: 42375.5. Samples: 87256460. Policy #0 lag: (min: 0.0, avg: 20.3, max: 43.0) [2024-03-29 13:32:28,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:32:30,312][00497] Updated weights for policy 0, policy_version 12519 (0.0020) [2024-03-29 13:32:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 205242368. Throughput: 0: 42141.3. Samples: 87383240. Policy #0 lag: (min: 0.0, avg: 20.3, max: 43.0) [2024-03-29 13:32:33,842][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:32:34,321][00497] Updated weights for policy 0, policy_version 12529 (0.0019) [2024-03-29 13:32:38,466][00497] Updated weights for policy 0, policy_version 12539 (0.0018) [2024-03-29 13:32:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 205455360. Throughput: 0: 42333.3. Samples: 87651880. Policy #0 lag: (min: 2.0, avg: 21.0, max: 42.0) [2024-03-29 13:32:38,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:32:42,189][00497] Updated weights for policy 0, policy_version 12549 (0.0024) [2024-03-29 13:32:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 205684736. Throughput: 0: 42599.5. Samples: 87898860. Policy #0 lag: (min: 2.0, avg: 21.0, max: 42.0) [2024-03-29 13:32:43,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:32:45,362][00497] Updated weights for policy 0, policy_version 12559 (0.0022) [2024-03-29 13:32:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 205897728. Throughput: 0: 42439.9. Samples: 88024540. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:32:48,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:32:49,685][00497] Updated weights for policy 0, policy_version 12569 (0.0023) [2024-03-29 13:32:53,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 206077952. Throughput: 0: 42676.8. Samples: 88292640. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:32:53,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:32:54,009][00497] Updated weights for policy 0, policy_version 12579 (0.0022) [2024-03-29 13:32:56,567][00476] Signal inference workers to stop experience collection... (3200 times) [2024-03-29 13:32:56,625][00497] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-03-29 13:32:56,643][00476] Signal inference workers to resume experience collection... (3200 times) [2024-03-29 13:32:56,656][00497] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-03-29 13:32:57,437][00497] Updated weights for policy 0, policy_version 12589 (0.0025) [2024-03-29 13:32:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 206323712. Throughput: 0: 42883.7. Samples: 88548920. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:32:58,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 13:33:00,620][00497] Updated weights for policy 0, policy_version 12599 (0.0025) [2024-03-29 13:33:03,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42871.3, 300 sec: 42154.1). Total num frames: 206536704. Throughput: 0: 42764.3. Samples: 88669700. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:33:03,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:33:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012607_206553088.pth... [2024-03-29 13:33:04,185][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011987_196395008.pth [2024-03-29 13:33:04,857][00497] Updated weights for policy 0, policy_version 12609 (0.0018) [2024-03-29 13:33:08,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 206733312. Throughput: 0: 42997.8. Samples: 88939400. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 13:33:08,842][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:33:09,405][00497] Updated weights for policy 0, policy_version 12619 (0.0018) [2024-03-29 13:33:12,911][00497] Updated weights for policy 0, policy_version 12629 (0.0028) [2024-03-29 13:33:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 206946304. Throughput: 0: 42901.9. Samples: 89187040. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 13:33:13,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 13:33:16,072][00497] Updated weights for policy 0, policy_version 12639 (0.0026) [2024-03-29 13:33:18,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 207175680. Throughput: 0: 42732.1. Samples: 89306180. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 13:33:18,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:33:20,477][00497] Updated weights for policy 0, policy_version 12649 (0.0022) [2024-03-29 13:33:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42871.6, 300 sec: 42209.6). Total num frames: 207372288. Throughput: 0: 42572.1. Samples: 89567620. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 13:33:23,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:33:25,097][00497] Updated weights for policy 0, policy_version 12659 (0.0025) [2024-03-29 13:33:28,483][00497] Updated weights for policy 0, policy_version 12669 (0.0026) [2024-03-29 13:33:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 207585280. Throughput: 0: 42911.6. Samples: 89829880. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 13:33:28,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 13:33:30,461][00476] Signal inference workers to stop experience collection... (3250 times) [2024-03-29 13:33:30,505][00497] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-03-29 13:33:30,542][00476] Signal inference workers to resume experience collection... (3250 times) [2024-03-29 13:33:30,544][00497] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-03-29 13:33:31,635][00497] Updated weights for policy 0, policy_version 12679 (0.0020) [2024-03-29 13:33:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 207798272. Throughput: 0: 42409.2. Samples: 89932960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 13:33:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:33:36,041][00497] Updated weights for policy 0, policy_version 12689 (0.0022) [2024-03-29 13:33:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 208011264. Throughput: 0: 42302.6. Samples: 90196260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:33:38,842][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:33:40,651][00497] Updated weights for policy 0, policy_version 12699 (0.0017) [2024-03-29 13:33:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 208207872. Throughput: 0: 42550.5. Samples: 90463700. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 13:33:43,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:33:44,000][00497] Updated weights for policy 0, policy_version 12709 (0.0026) [2024-03-29 13:33:46,992][00497] Updated weights for policy 0, policy_version 12719 (0.0020) [2024-03-29 13:33:48,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 208453632. Throughput: 0: 42313.9. Samples: 90573820. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 13:33:48,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 13:33:51,007][00497] Updated weights for policy 0, policy_version 12729 (0.0021) [2024-03-29 13:33:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 42265.2). Total num frames: 208666624. Throughput: 0: 42388.4. Samples: 90846880. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:33:53,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:33:55,880][00497] Updated weights for policy 0, policy_version 12739 (0.0022) [2024-03-29 13:33:58,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 208863232. Throughput: 0: 42821.0. Samples: 91113980. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:33:58,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:33:59,105][00497] Updated weights for policy 0, policy_version 12749 (0.0027) [2024-03-29 13:34:02,146][00497] Updated weights for policy 0, policy_version 12759 (0.0015) [2024-03-29 13:34:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42871.5, 300 sec: 42320.7). Total num frames: 209108992. Throughput: 0: 42613.2. Samples: 91223780. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 13:34:03,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:34:06,328][00497] Updated weights for policy 0, policy_version 12769 (0.0021) [2024-03-29 13:34:07,050][00476] Signal inference workers to stop experience collection... (3300 times) [2024-03-29 13:34:07,051][00476] Signal inference workers to resume experience collection... (3300 times) [2024-03-29 13:34:07,091][00497] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-03-29 13:34:07,091][00497] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-03-29 13:34:08,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 209305600. Throughput: 0: 42872.8. Samples: 91496900. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 13:34:08,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:34:11,527][00497] Updated weights for policy 0, policy_version 12779 (0.0027) [2024-03-29 13:34:13,839][00126] Fps is (10 sec: 36045.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 209469440. Throughput: 0: 42328.9. Samples: 91734680. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 13:34:13,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:34:14,933][00497] Updated weights for policy 0, policy_version 12789 (0.0021) [2024-03-29 13:34:17,860][00497] Updated weights for policy 0, policy_version 12799 (0.0024) [2024-03-29 13:34:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 209731584. Throughput: 0: 42673.9. Samples: 91853280. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 13:34:18,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:34:22,187][00497] Updated weights for policy 0, policy_version 12809 (0.0022) [2024-03-29 13:34:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 209928192. Throughput: 0: 42517.9. Samples: 92109560. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:34:23,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:34:27,211][00497] Updated weights for policy 0, policy_version 12819 (0.0019) [2024-03-29 13:34:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 210108416. Throughput: 0: 42433.4. Samples: 92373200. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:34:28,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:34:30,453][00497] Updated weights for policy 0, policy_version 12829 (0.0023) [2024-03-29 13:34:33,433][00497] Updated weights for policy 0, policy_version 12839 (0.0029) [2024-03-29 13:34:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 210354176. Throughput: 0: 42429.8. Samples: 92483160. Policy #0 lag: (min: 1.0, avg: 23.0, max: 43.0) [2024-03-29 13:34:33,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 13:34:37,672][00497] Updated weights for policy 0, policy_version 12849 (0.0018) [2024-03-29 13:34:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 210550784. Throughput: 0: 41938.2. Samples: 92734100. Policy #0 lag: (min: 1.0, avg: 23.0, max: 43.0) [2024-03-29 13:34:38,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:34:42,638][00497] Updated weights for policy 0, policy_version 12859 (0.0036) [2024-03-29 13:34:43,655][00476] Signal inference workers to stop experience collection... (3350 times) [2024-03-29 13:34:43,655][00476] Signal inference workers to resume experience collection... (3350 times) [2024-03-29 13:34:43,694][00497] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-03-29 13:34:43,695][00497] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-03-29 13:34:43,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 210731008. Throughput: 0: 42169.6. Samples: 93011620. Policy #0 lag: (min: 0.0, avg: 17.3, max: 41.0) [2024-03-29 13:34:43,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:34:45,791][00497] Updated weights for policy 0, policy_version 12869 (0.0033) [2024-03-29 13:34:48,791][00497] Updated weights for policy 0, policy_version 12879 (0.0019) [2024-03-29 13:34:48,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 211009536. Throughput: 0: 42320.4. Samples: 93128200. Policy #0 lag: (min: 0.0, avg: 17.3, max: 41.0) [2024-03-29 13:34:48,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:34:53,064][00497] Updated weights for policy 0, policy_version 12889 (0.0029) [2024-03-29 13:34:53,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 42320.7). Total num frames: 211189760. Throughput: 0: 41643.2. Samples: 93370840. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:34:53,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:34:58,269][00497] Updated weights for policy 0, policy_version 12899 (0.0027) [2024-03-29 13:34:58,839][00126] Fps is (10 sec: 34406.5, 60 sec: 41506.0, 300 sec: 42154.1). Total num frames: 211353600. Throughput: 0: 42478.2. Samples: 93646200. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:34:58,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:35:01,511][00497] Updated weights for policy 0, policy_version 12909 (0.0031) [2024-03-29 13:35:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 42320.7). Total num frames: 211615744. Throughput: 0: 42326.7. Samples: 93757980. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:35:03,840][00126] Avg episode reward: [(0, '0.292')] [2024-03-29 13:35:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012917_211632128.pth... [2024-03-29 13:35:04,223][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012297_201474048.pth [2024-03-29 13:35:04,754][00497] Updated weights for policy 0, policy_version 12919 (0.0029) [2024-03-29 13:35:08,593][00497] Updated weights for policy 0, policy_version 12929 (0.0022) [2024-03-29 13:35:08,839][00126] Fps is (10 sec: 47514.0, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 211828736. Throughput: 0: 41934.2. Samples: 93996600. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:35:08,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:35:13,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 211976192. Throughput: 0: 42103.5. Samples: 94267860. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:35:13,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:35:13,974][00497] Updated weights for policy 0, policy_version 12939 (0.0023) [2024-03-29 13:35:16,856][00476] Signal inference workers to stop experience collection... (3400 times) [2024-03-29 13:35:16,902][00497] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-03-29 13:35:17,047][00476] Signal inference workers to resume experience collection... (3400 times) [2024-03-29 13:35:17,047][00497] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-03-29 13:35:17,052][00497] Updated weights for policy 0, policy_version 12949 (0.0024) [2024-03-29 13:35:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 212254720. Throughput: 0: 42475.0. Samples: 94394540. Policy #0 lag: (min: 0.0, avg: 17.9, max: 41.0) [2024-03-29 13:35:18,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 13:35:19,938][00497] Updated weights for policy 0, policy_version 12959 (0.0023) [2024-03-29 13:35:23,839][00126] Fps is (10 sec: 49152.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 212467712. Throughput: 0: 42223.6. Samples: 94634160. Policy #0 lag: (min: 0.0, avg: 17.9, max: 41.0) [2024-03-29 13:35:23,841][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:35:24,060][00497] Updated weights for policy 0, policy_version 12969 (0.0021) [2024-03-29 13:35:28,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 212631552. Throughput: 0: 42014.6. Samples: 94902280. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 13:35:28,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:35:29,388][00497] Updated weights for policy 0, policy_version 12979 (0.0019) [2024-03-29 13:35:32,518][00497] Updated weights for policy 0, policy_version 12989 (0.0025) [2024-03-29 13:35:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 212877312. Throughput: 0: 42319.7. Samples: 95032580. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 13:35:33,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:35:35,697][00497] Updated weights for policy 0, policy_version 12999 (0.0024) [2024-03-29 13:35:38,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 213090304. Throughput: 0: 41948.0. Samples: 95258500. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 13:35:38,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:35:39,599][00497] Updated weights for policy 0, policy_version 13009 (0.0026) [2024-03-29 13:35:43,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 213270528. Throughput: 0: 41737.3. Samples: 95524380. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 13:35:43,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:35:44,932][00497] Updated weights for policy 0, policy_version 13019 (0.0037) [2024-03-29 13:35:47,868][00476] Signal inference workers to stop experience collection... (3450 times) [2024-03-29 13:35:47,890][00497] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-03-29 13:35:48,071][00476] Signal inference workers to resume experience collection... (3450 times) [2024-03-29 13:35:48,071][00497] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-03-29 13:35:48,075][00497] Updated weights for policy 0, policy_version 13029 (0.0029) [2024-03-29 13:35:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 42209.6). Total num frames: 213483520. Throughput: 0: 42407.5. Samples: 95666320. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 13:35:48,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:35:51,502][00497] Updated weights for policy 0, policy_version 13039 (0.0020) [2024-03-29 13:35:53,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 213729280. Throughput: 0: 41934.2. Samples: 95883640. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 13:35:53,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:35:55,259][00497] Updated weights for policy 0, policy_version 13049 (0.0031) [2024-03-29 13:35:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42871.6, 300 sec: 42265.2). Total num frames: 213925888. Throughput: 0: 41718.8. Samples: 96145200. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 13:35:58,841][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 13:36:00,661][00497] Updated weights for policy 0, policy_version 13059 (0.0028) [2024-03-29 13:36:03,671][00497] Updated weights for policy 0, policy_version 13069 (0.0031) [2024-03-29 13:36:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 214122496. Throughput: 0: 42110.7. Samples: 96289520. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 13:36:03,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 13:36:07,136][00497] Updated weights for policy 0, policy_version 13079 (0.0031) [2024-03-29 13:36:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 214351872. Throughput: 0: 41673.8. Samples: 96509480. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:36:08,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 13:36:11,168][00497] Updated weights for policy 0, policy_version 13089 (0.0027) [2024-03-29 13:36:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42871.4, 300 sec: 42265.1). Total num frames: 214548480. Throughput: 0: 41532.9. Samples: 96771260. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:36:13,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:36:16,540][00497] Updated weights for policy 0, policy_version 13099 (0.0023) [2024-03-29 13:36:18,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 42209.6). Total num frames: 214728704. Throughput: 0: 41844.8. Samples: 96915600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 13:36:18,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 13:36:19,804][00497] Updated weights for policy 0, policy_version 13109 (0.0036) [2024-03-29 13:36:20,247][00476] Signal inference workers to stop experience collection... (3500 times) [2024-03-29 13:36:20,325][00497] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-03-29 13:36:20,334][00476] Signal inference workers to resume experience collection... (3500 times) [2024-03-29 13:36:20,352][00497] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-03-29 13:36:22,830][00497] Updated weights for policy 0, policy_version 13119 (0.0030) [2024-03-29 13:36:23,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 214974464. Throughput: 0: 41628.9. Samples: 97131800. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 13:36:23,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:36:26,822][00497] Updated weights for policy 0, policy_version 13129 (0.0025) [2024-03-29 13:36:28,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42209.7). Total num frames: 215171072. Throughput: 0: 41677.0. Samples: 97399840. Policy #0 lag: (min: 2.0, avg: 23.1, max: 42.0) [2024-03-29 13:36:28,841][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 13:36:32,267][00497] Updated weights for policy 0, policy_version 13139 (0.0022) [2024-03-29 13:36:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 215351296. Throughput: 0: 41464.5. Samples: 97532220. Policy #0 lag: (min: 2.0, avg: 23.1, max: 42.0) [2024-03-29 13:36:33,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:36:35,316][00497] Updated weights for policy 0, policy_version 13149 (0.0030) [2024-03-29 13:36:38,494][00497] Updated weights for policy 0, policy_version 13159 (0.0030) [2024-03-29 13:36:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 215597056. Throughput: 0: 41709.8. Samples: 97760580. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 13:36:38,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:36:42,519][00497] Updated weights for policy 0, policy_version 13169 (0.0021) [2024-03-29 13:36:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 215793664. Throughput: 0: 41467.9. Samples: 98011260. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 13:36:43,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:36:47,836][00497] Updated weights for policy 0, policy_version 13179 (0.0028) [2024-03-29 13:36:48,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 215973888. Throughput: 0: 41228.8. Samples: 98144820. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 13:36:48,840][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 13:36:51,038][00497] Updated weights for policy 0, policy_version 13189 (0.0025) [2024-03-29 13:36:52,110][00476] Signal inference workers to stop experience collection... (3550 times) [2024-03-29 13:36:52,127][00497] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-03-29 13:36:52,320][00476] Signal inference workers to resume experience collection... (3550 times) [2024-03-29 13:36:52,320][00497] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-03-29 13:36:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 216219648. Throughput: 0: 42107.6. Samples: 98404320. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 13:36:53,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 13:36:54,554][00497] Updated weights for policy 0, policy_version 13199 (0.0027) [2024-03-29 13:36:58,306][00497] Updated weights for policy 0, policy_version 13209 (0.0019) [2024-03-29 13:36:58,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.1, 300 sec: 42265.2). Total num frames: 216432640. Throughput: 0: 41373.4. Samples: 98633060. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 13:36:58,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:37:03,643][00497] Updated weights for policy 0, policy_version 13219 (0.0022) [2024-03-29 13:37:03,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40960.0, 300 sec: 42098.5). Total num frames: 216580096. Throughput: 0: 41079.6. Samples: 98764180. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 13:37:03,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:37:04,244][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013221_216612864.pth... [2024-03-29 13:37:04,559][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012607_206553088.pth [2024-03-29 13:37:06,894][00497] Updated weights for policy 0, policy_version 13229 (0.0022) [2024-03-29 13:37:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 216842240. Throughput: 0: 41963.9. Samples: 99020180. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 13:37:08,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 13:37:10,379][00497] Updated weights for policy 0, policy_version 13239 (0.0029) [2024-03-29 13:37:13,839][00126] Fps is (10 sec: 47513.7, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 217055232. Throughput: 0: 41091.1. Samples: 99248940. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 13:37:13,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:37:14,434][00497] Updated weights for policy 0, policy_version 13249 (0.0025) [2024-03-29 13:37:18,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 217219072. Throughput: 0: 41060.0. Samples: 99379920. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:37:18,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:37:19,517][00497] Updated weights for policy 0, policy_version 13259 (0.0025) [2024-03-29 13:37:22,792][00497] Updated weights for policy 0, policy_version 13269 (0.0028) [2024-03-29 13:37:23,629][00476] Signal inference workers to stop experience collection... (3600 times) [2024-03-29 13:37:23,705][00497] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-03-29 13:37:23,708][00476] Signal inference workers to resume experience collection... (3600 times) [2024-03-29 13:37:23,733][00497] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-03-29 13:37:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 42098.6). Total num frames: 217448448. Throughput: 0: 42029.3. Samples: 99651900. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:37:23,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 13:37:25,863][00497] Updated weights for policy 0, policy_version 13279 (0.0020) [2024-03-29 13:37:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 217677824. Throughput: 0: 41634.2. Samples: 99884800. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:37:28,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 13:37:30,003][00497] Updated weights for policy 0, policy_version 13289 (0.0019) [2024-03-29 13:37:33,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 217858048. Throughput: 0: 41584.4. Samples: 100016120. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:37:33,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:37:34,916][00497] Updated weights for policy 0, policy_version 13299 (0.0022) [2024-03-29 13:37:38,226][00497] Updated weights for policy 0, policy_version 13309 (0.0018) [2024-03-29 13:37:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 218087424. Throughput: 0: 41803.1. Samples: 100285460. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:37:38,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:37:41,526][00497] Updated weights for policy 0, policy_version 13319 (0.0023) [2024-03-29 13:37:43,839][00126] Fps is (10 sec: 44237.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 218300416. Throughput: 0: 41653.8. Samples: 100507480. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:37:43,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:37:45,658][00497] Updated weights for policy 0, policy_version 13329 (0.0020) [2024-03-29 13:37:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 218497024. Throughput: 0: 41636.0. Samples: 100637800. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:37:48,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:37:50,658][00497] Updated weights for policy 0, policy_version 13339 (0.0022) [2024-03-29 13:37:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 218693632. Throughput: 0: 42007.1. Samples: 100910500. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:37:53,840][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 13:37:53,866][00497] Updated weights for policy 0, policy_version 13349 (0.0018) [2024-03-29 13:37:57,307][00497] Updated weights for policy 0, policy_version 13359 (0.0026) [2024-03-29 13:37:58,152][00476] Signal inference workers to stop experience collection... (3650 times) [2024-03-29 13:37:58,173][00497] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-03-29 13:37:58,328][00476] Signal inference workers to resume experience collection... (3650 times) [2024-03-29 13:37:58,328][00497] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-03-29 13:37:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 218939392. Throughput: 0: 42130.7. Samples: 101144820. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 13:37:58,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:38:01,261][00497] Updated weights for policy 0, policy_version 13369 (0.0020) [2024-03-29 13:38:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 219136000. Throughput: 0: 41970.5. Samples: 101268600. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 13:38:03,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:38:06,092][00497] Updated weights for policy 0, policy_version 13379 (0.0022) [2024-03-29 13:38:08,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 219316224. Throughput: 0: 42011.1. Samples: 101542400. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:38:08,841][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 13:38:09,605][00497] Updated weights for policy 0, policy_version 13389 (0.0025) [2024-03-29 13:38:12,821][00497] Updated weights for policy 0, policy_version 13399 (0.0018) [2024-03-29 13:38:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 219561984. Throughput: 0: 41942.2. Samples: 101772200. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:38:13,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:38:16,859][00497] Updated weights for policy 0, policy_version 13409 (0.0023) [2024-03-29 13:38:18,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 219774976. Throughput: 0: 41690.8. Samples: 101892200. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:38:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:38:21,551][00497] Updated weights for policy 0, policy_version 13419 (0.0018) [2024-03-29 13:38:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 219955200. Throughput: 0: 41913.7. Samples: 102171580. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:38:23,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:38:25,272][00497] Updated weights for policy 0, policy_version 13429 (0.0018) [2024-03-29 13:38:28,600][00497] Updated weights for policy 0, policy_version 13439 (0.0022) [2024-03-29 13:38:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 220184576. Throughput: 0: 42073.3. Samples: 102400780. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:38:28,840][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 13:38:32,188][00476] Signal inference workers to stop experience collection... (3700 times) [2024-03-29 13:38:32,220][00497] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-03-29 13:38:32,402][00476] Signal inference workers to resume experience collection... (3700 times) [2024-03-29 13:38:32,402][00497] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-03-29 13:38:32,405][00497] Updated weights for policy 0, policy_version 13449 (0.0022) [2024-03-29 13:38:33,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 220397568. Throughput: 0: 41826.2. Samples: 102519980. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:38:33,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 13:38:37,254][00497] Updated weights for policy 0, policy_version 13459 (0.0024) [2024-03-29 13:38:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41932.0). Total num frames: 220577792. Throughput: 0: 41869.0. Samples: 102794600. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 13:38:38,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:38:41,148][00497] Updated weights for policy 0, policy_version 13469 (0.0030) [2024-03-29 13:38:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 220807168. Throughput: 0: 41662.6. Samples: 103019640. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 13:38:43,842][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:38:44,377][00497] Updated weights for policy 0, policy_version 13479 (0.0024) [2024-03-29 13:38:48,174][00497] Updated weights for policy 0, policy_version 13489 (0.0024) [2024-03-29 13:38:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 221020160. Throughput: 0: 41669.9. Samples: 103143740. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:38:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:38:52,853][00497] Updated weights for policy 0, policy_version 13499 (0.0027) [2024-03-29 13:38:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 221200384. Throughput: 0: 41584.1. Samples: 103413680. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:38:53,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:38:56,548][00497] Updated weights for policy 0, policy_version 13509 (0.0018) [2024-03-29 13:38:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 221446144. Throughput: 0: 41780.5. Samples: 103652320. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 13:38:58,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:38:59,959][00497] Updated weights for policy 0, policy_version 13519 (0.0025) [2024-03-29 13:39:03,387][00497] Updated weights for policy 0, policy_version 13529 (0.0020) [2024-03-29 13:39:03,839][00126] Fps is (10 sec: 47512.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 221675520. Throughput: 0: 42106.5. Samples: 103787000. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 13:39:03,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 13:39:03,862][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013530_221675520.pth... [2024-03-29 13:39:04,161][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012917_211632128.pth [2024-03-29 13:39:06,961][00476] Signal inference workers to stop experience collection... (3750 times) [2024-03-29 13:39:06,963][00476] Signal inference workers to resume experience collection... (3750 times) [2024-03-29 13:39:07,013][00497] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-03-29 13:39:07,013][00497] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-03-29 13:39:08,351][00497] Updated weights for policy 0, policy_version 13539 (0.0029) [2024-03-29 13:39:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 221839360. Throughput: 0: 41711.2. Samples: 104048580. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:39:08,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 13:39:12,422][00497] Updated weights for policy 0, policy_version 13549 (0.0024) [2024-03-29 13:39:13,839][00126] Fps is (10 sec: 39322.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 222068736. Throughput: 0: 41944.6. Samples: 104288280. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:39:13,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:39:15,756][00497] Updated weights for policy 0, policy_version 13559 (0.0030) [2024-03-29 13:39:18,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 222298112. Throughput: 0: 42096.5. Samples: 104414320. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:39:18,841][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:39:19,141][00497] Updated weights for policy 0, policy_version 13569 (0.0019) [2024-03-29 13:39:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 222461952. Throughput: 0: 41688.4. Samples: 104670580. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:39:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:39:24,041][00497] Updated weights for policy 0, policy_version 13579 (0.0030) [2024-03-29 13:39:27,812][00497] Updated weights for policy 0, policy_version 13589 (0.0026) [2024-03-29 13:39:28,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 222691328. Throughput: 0: 42580.4. Samples: 104935760. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 13:39:28,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 13:39:31,155][00497] Updated weights for policy 0, policy_version 13599 (0.0020) [2024-03-29 13:39:33,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 222920704. Throughput: 0: 42364.3. Samples: 105050140. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 13:39:33,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:39:34,630][00497] Updated weights for policy 0, policy_version 13609 (0.0024) [2024-03-29 13:39:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 223117312. Throughput: 0: 42019.1. Samples: 105304540. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 13:39:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:39:39,396][00497] Updated weights for policy 0, policy_version 13619 (0.0021) [2024-03-29 13:39:43,258][00497] Updated weights for policy 0, policy_version 13629 (0.0022) [2024-03-29 13:39:43,582][00476] Signal inference workers to stop experience collection... (3800 times) [2024-03-29 13:39:43,605][00497] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-03-29 13:39:43,804][00476] Signal inference workers to resume experience collection... (3800 times) [2024-03-29 13:39:43,805][00497] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-03-29 13:39:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 223330304. Throughput: 0: 42760.4. Samples: 105576540. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 13:39:43,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:39:46,619][00497] Updated weights for policy 0, policy_version 13639 (0.0028) [2024-03-29 13:39:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 223543296. Throughput: 0: 42143.3. Samples: 105683440. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:39:48,840][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 13:39:50,208][00497] Updated weights for policy 0, policy_version 13649 (0.0043) [2024-03-29 13:39:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 223756288. Throughput: 0: 41886.6. Samples: 105933480. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:39:53,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:39:54,815][00497] Updated weights for policy 0, policy_version 13659 (0.0027) [2024-03-29 13:39:58,766][00497] Updated weights for policy 0, policy_version 13669 (0.0024) [2024-03-29 13:39:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 223952896. Throughput: 0: 42664.9. Samples: 106208200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:39:58,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 13:40:02,391][00497] Updated weights for policy 0, policy_version 13679 (0.0020) [2024-03-29 13:40:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 224182272. Throughput: 0: 42245.2. Samples: 106315360. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 13:40:03,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 13:40:05,725][00497] Updated weights for policy 0, policy_version 13689 (0.0027) [2024-03-29 13:40:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 224378880. Throughput: 0: 42045.3. Samples: 106562620. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 13:40:08,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:40:10,311][00497] Updated weights for policy 0, policy_version 13699 (0.0019) [2024-03-29 13:40:13,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 224575488. Throughput: 0: 42294.1. Samples: 106839000. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 13:40:13,841][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:40:14,401][00497] Updated weights for policy 0, policy_version 13709 (0.0024) [2024-03-29 13:40:17,950][00497] Updated weights for policy 0, policy_version 13719 (0.0019) [2024-03-29 13:40:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 224804864. Throughput: 0: 42009.1. Samples: 106940540. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 13:40:18,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:40:19,290][00476] Signal inference workers to stop experience collection... (3850 times) [2024-03-29 13:40:19,313][00497] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-03-29 13:40:19,484][00476] Signal inference workers to resume experience collection... (3850 times) [2024-03-29 13:40:19,485][00497] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-03-29 13:40:21,486][00497] Updated weights for policy 0, policy_version 13729 (0.0027) [2024-03-29 13:40:23,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42871.3, 300 sec: 42043.0). Total num frames: 225034240. Throughput: 0: 42040.3. Samples: 107196360. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:40:23,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:40:25,997][00497] Updated weights for policy 0, policy_version 13739 (0.0020) [2024-03-29 13:40:28,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 225198080. Throughput: 0: 42085.3. Samples: 107470380. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:40:28,841][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:40:29,891][00497] Updated weights for policy 0, policy_version 13749 (0.0019) [2024-03-29 13:40:33,474][00497] Updated weights for policy 0, policy_version 13759 (0.0027) [2024-03-29 13:40:33,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 225443840. Throughput: 0: 42204.3. Samples: 107582640. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:40:33,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:40:36,743][00497] Updated weights for policy 0, policy_version 13769 (0.0023) [2024-03-29 13:40:38,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 225656832. Throughput: 0: 42214.3. Samples: 107833120. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:40:38,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:40:41,489][00497] Updated weights for policy 0, policy_version 13779 (0.0029) [2024-03-29 13:40:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 225820672. Throughput: 0: 42031.8. Samples: 108099640. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 13:40:43,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 13:40:45,380][00497] Updated weights for policy 0, policy_version 13789 (0.0028) [2024-03-29 13:40:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 226066432. Throughput: 0: 42239.7. Samples: 108216140. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 13:40:48,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:40:48,983][00497] Updated weights for policy 0, policy_version 13799 (0.0022) [2024-03-29 13:40:52,265][00497] Updated weights for policy 0, policy_version 13809 (0.0027) [2024-03-29 13:40:53,145][00476] Signal inference workers to stop experience collection... (3900 times) [2024-03-29 13:40:53,177][00497] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-03-29 13:40:53,331][00476] Signal inference workers to resume experience collection... (3900 times) [2024-03-29 13:40:53,332][00497] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-03-29 13:40:53,839][00126] Fps is (10 sec: 47514.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 226295808. Throughput: 0: 42216.9. Samples: 108462380. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 13:40:53,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:40:57,073][00497] Updated weights for policy 0, policy_version 13819 (0.0037) [2024-03-29 13:40:58,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 226476032. Throughput: 0: 41968.5. Samples: 108727580. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 13:40:58,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:41:00,815][00497] Updated weights for policy 0, policy_version 13829 (0.0021) [2024-03-29 13:41:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 226705408. Throughput: 0: 42618.2. Samples: 108858360. Policy #0 lag: (min: 1.0, avg: 20.5, max: 44.0) [2024-03-29 13:41:03,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:41:03,895][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013838_226721792.pth... [2024-03-29 13:41:04,250][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013221_216612864.pth [2024-03-29 13:41:04,672][00497] Updated weights for policy 0, policy_version 13839 (0.0018) [2024-03-29 13:41:08,123][00497] Updated weights for policy 0, policy_version 13849 (0.0025) [2024-03-29 13:41:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 226902016. Throughput: 0: 41950.0. Samples: 109084100. Policy #0 lag: (min: 1.0, avg: 20.5, max: 44.0) [2024-03-29 13:41:08,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:41:12,937][00497] Updated weights for policy 0, policy_version 13859 (0.0027) [2024-03-29 13:41:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 227115008. Throughput: 0: 41877.8. Samples: 109354880. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:41:13,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:41:16,749][00497] Updated weights for policy 0, policy_version 13869 (0.0022) [2024-03-29 13:41:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 227344384. Throughput: 0: 42223.2. Samples: 109482680. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:41:18,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 13:41:20,436][00497] Updated weights for policy 0, policy_version 13879 (0.0027) [2024-03-29 13:41:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 227540992. Throughput: 0: 41704.9. Samples: 109709840. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 13:41:23,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 13:41:23,885][00497] Updated weights for policy 0, policy_version 13889 (0.0021) [2024-03-29 13:41:25,833][00476] Signal inference workers to stop experience collection... (3950 times) [2024-03-29 13:41:25,873][00497] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-03-29 13:41:26,016][00476] Signal inference workers to resume experience collection... (3950 times) [2024-03-29 13:41:26,016][00497] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-03-29 13:41:28,342][00497] Updated weights for policy 0, policy_version 13899 (0.0027) [2024-03-29 13:41:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 227737600. Throughput: 0: 41893.0. Samples: 109984820. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 13:41:28,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:41:32,191][00497] Updated weights for policy 0, policy_version 13909 (0.0027) [2024-03-29 13:41:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 227950592. Throughput: 0: 42271.5. Samples: 110118360. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 13:41:33,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:41:35,910][00497] Updated weights for policy 0, policy_version 13919 (0.0033) [2024-03-29 13:41:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 228179968. Throughput: 0: 41847.6. Samples: 110345520. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:41:38,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:41:39,302][00497] Updated weights for policy 0, policy_version 13929 (0.0031) [2024-03-29 13:41:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 228360192. Throughput: 0: 41881.4. Samples: 110612240. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:41:43,840][00126] Avg episode reward: [(0, '0.281')] [2024-03-29 13:41:43,930][00497] Updated weights for policy 0, policy_version 13939 (0.0018) [2024-03-29 13:41:47,643][00497] Updated weights for policy 0, policy_version 13949 (0.0022) [2024-03-29 13:41:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 228605952. Throughput: 0: 42060.9. Samples: 110751100. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 13:41:48,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:41:51,286][00497] Updated weights for policy 0, policy_version 13959 (0.0019) [2024-03-29 13:41:53,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 228818944. Throughput: 0: 42373.2. Samples: 110990900. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 13:41:53,840][00126] Avg episode reward: [(0, '0.227')] [2024-03-29 13:41:54,847][00497] Updated weights for policy 0, policy_version 13969 (0.0034) [2024-03-29 13:41:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 229015552. Throughput: 0: 41910.7. Samples: 111240860. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:41:58,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:41:59,650][00497] Updated weights for policy 0, policy_version 13979 (0.0026) [2024-03-29 13:42:01,356][00476] Signal inference workers to stop experience collection... (4000 times) [2024-03-29 13:42:01,357][00476] Signal inference workers to resume experience collection... (4000 times) [2024-03-29 13:42:01,398][00497] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-03-29 13:42:01,399][00497] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-03-29 13:42:03,333][00497] Updated weights for policy 0, policy_version 13989 (0.0024) [2024-03-29 13:42:03,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 229212160. Throughput: 0: 42130.7. Samples: 111378560. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:42:03,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:42:06,770][00497] Updated weights for policy 0, policy_version 13999 (0.0028) [2024-03-29 13:42:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 229457920. Throughput: 0: 42505.8. Samples: 111622600. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:42:08,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 13:42:10,606][00497] Updated weights for policy 0, policy_version 14009 (0.0023) [2024-03-29 13:42:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 229638144. Throughput: 0: 41931.1. Samples: 111871720. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:42:13,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:42:15,213][00497] Updated weights for policy 0, policy_version 14019 (0.0018) [2024-03-29 13:42:18,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 229834752. Throughput: 0: 41897.6. Samples: 112003760. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 13:42:18,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:42:18,921][00497] Updated weights for policy 0, policy_version 14029 (0.0018) [2024-03-29 13:42:22,479][00497] Updated weights for policy 0, policy_version 14039 (0.0026) [2024-03-29 13:42:23,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 230096896. Throughput: 0: 42436.0. Samples: 112255140. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 13:42:23,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:42:26,102][00497] Updated weights for policy 0, policy_version 14049 (0.0028) [2024-03-29 13:42:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 230277120. Throughput: 0: 41995.8. Samples: 112502060. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 13:42:28,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:42:30,754][00497] Updated weights for policy 0, policy_version 14059 (0.0028) [2024-03-29 13:42:33,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 230457344. Throughput: 0: 41931.6. Samples: 112638020. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 13:42:33,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:42:34,324][00476] Signal inference workers to stop experience collection... (4050 times) [2024-03-29 13:42:34,355][00497] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-03-29 13:42:34,522][00476] Signal inference workers to resume experience collection... (4050 times) [2024-03-29 13:42:34,522][00497] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-03-29 13:42:34,529][00497] Updated weights for policy 0, policy_version 14069 (0.0024) [2024-03-29 13:42:38,165][00497] Updated weights for policy 0, policy_version 14079 (0.0028) [2024-03-29 13:42:38,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 230703104. Throughput: 0: 42006.7. Samples: 112881200. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:42:38,841][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:42:41,977][00497] Updated weights for policy 0, policy_version 14089 (0.0021) [2024-03-29 13:42:43,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 230899712. Throughput: 0: 42032.4. Samples: 113132320. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:42:43,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:42:46,474][00497] Updated weights for policy 0, policy_version 14099 (0.0018) [2024-03-29 13:42:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 231096320. Throughput: 0: 41915.9. Samples: 113264780. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:42:48,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:42:50,072][00497] Updated weights for policy 0, policy_version 14109 (0.0029) [2024-03-29 13:42:53,692][00497] Updated weights for policy 0, policy_version 14119 (0.0020) [2024-03-29 13:42:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 231325696. Throughput: 0: 41934.6. Samples: 113509660. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 13:42:53,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 13:42:57,665][00497] Updated weights for policy 0, policy_version 14129 (0.0023) [2024-03-29 13:42:58,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 231522304. Throughput: 0: 41977.8. Samples: 113760720. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 13:42:58,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:43:02,123][00497] Updated weights for policy 0, policy_version 14139 (0.0026) [2024-03-29 13:43:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 231735296. Throughput: 0: 42031.2. Samples: 113895160. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 13:43:03,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:43:04,069][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014145_231751680.pth... [2024-03-29 13:43:04,380][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013530_221675520.pth [2024-03-29 13:43:05,611][00497] Updated weights for policy 0, policy_version 14149 (0.0022) [2024-03-29 13:43:07,360][00476] Signal inference workers to stop experience collection... (4100 times) [2024-03-29 13:43:07,430][00497] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-03-29 13:43:07,527][00476] Signal inference workers to resume experience collection... (4100 times) [2024-03-29 13:43:07,527][00497] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-03-29 13:43:08,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 231948288. Throughput: 0: 41992.8. Samples: 114144820. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 13:43:08,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 13:43:09,230][00497] Updated weights for policy 0, policy_version 14159 (0.0026) [2024-03-29 13:43:13,144][00497] Updated weights for policy 0, policy_version 14169 (0.0023) [2024-03-29 13:43:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.1, 300 sec: 41987.4). Total num frames: 232161280. Throughput: 0: 41918.2. Samples: 114388380. Policy #0 lag: (min: 3.0, avg: 22.8, max: 43.0) [2024-03-29 13:43:13,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 13:43:17,694][00497] Updated weights for policy 0, policy_version 14179 (0.0023) [2024-03-29 13:43:18,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 232357888. Throughput: 0: 41629.7. Samples: 114511360. Policy #0 lag: (min: 3.0, avg: 22.8, max: 43.0) [2024-03-29 13:43:18,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:43:21,378][00497] Updated weights for policy 0, policy_version 14189 (0.0031) [2024-03-29 13:43:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 232587264. Throughput: 0: 42148.5. Samples: 114777880. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:43:23,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:43:24,953][00497] Updated weights for policy 0, policy_version 14199 (0.0027) [2024-03-29 13:43:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 232783872. Throughput: 0: 41658.3. Samples: 115006940. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:43:28,840][00126] Avg episode reward: [(0, '0.264')] [2024-03-29 13:43:28,847][00497] Updated weights for policy 0, policy_version 14209 (0.0028) [2024-03-29 13:43:33,490][00497] Updated weights for policy 0, policy_version 14219 (0.0021) [2024-03-29 13:43:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 232980480. Throughput: 0: 41822.7. Samples: 115146800. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 13:43:33,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 13:43:36,950][00497] Updated weights for policy 0, policy_version 14229 (0.0028) [2024-03-29 13:43:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 233209856. Throughput: 0: 42048.8. Samples: 115401860. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 13:43:38,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 13:43:40,653][00476] Signal inference workers to stop experience collection... (4150 times) [2024-03-29 13:43:40,655][00476] Signal inference workers to resume experience collection... (4150 times) [2024-03-29 13:43:40,697][00497] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-03-29 13:43:40,698][00497] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-03-29 13:43:40,919][00497] Updated weights for policy 0, policy_version 14239 (0.0023) [2024-03-29 13:43:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 233422848. Throughput: 0: 41224.9. Samples: 115615840. Policy #0 lag: (min: 2.0, avg: 24.0, max: 42.0) [2024-03-29 13:43:43,840][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 13:43:44,977][00497] Updated weights for policy 0, policy_version 14249 (0.0022) [2024-03-29 13:43:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 233603072. Throughput: 0: 41479.5. Samples: 115761740. Policy #0 lag: (min: 2.0, avg: 24.0, max: 42.0) [2024-03-29 13:43:48,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 13:43:49,648][00497] Updated weights for policy 0, policy_version 14259 (0.0023) [2024-03-29 13:43:53,124][00497] Updated weights for policy 0, policy_version 14269 (0.0024) [2024-03-29 13:43:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 233799680. Throughput: 0: 41583.7. Samples: 116016080. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:43:53,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:43:56,857][00497] Updated weights for policy 0, policy_version 14279 (0.0020) [2024-03-29 13:43:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 234045440. Throughput: 0: 40926.7. Samples: 116230080. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:43:58,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 13:44:01,164][00497] Updated weights for policy 0, policy_version 14289 (0.0017) [2024-03-29 13:44:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 234225664. Throughput: 0: 41300.5. Samples: 116369880. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:44:03,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:44:05,414][00497] Updated weights for policy 0, policy_version 14299 (0.0031) [2024-03-29 13:44:08,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 234422272. Throughput: 0: 41288.5. Samples: 116635860. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:44:08,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:44:09,044][00497] Updated weights for policy 0, policy_version 14309 (0.0018) [2024-03-29 13:44:12,770][00497] Updated weights for policy 0, policy_version 14319 (0.0032) [2024-03-29 13:44:13,107][00476] Signal inference workers to stop experience collection... (4200 times) [2024-03-29 13:44:13,108][00476] Signal inference workers to resume experience collection... (4200 times) [2024-03-29 13:44:13,150][00497] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-03-29 13:44:13,150][00497] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-03-29 13:44:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 41876.4). Total num frames: 234651648. Throughput: 0: 41394.6. Samples: 116869700. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:44:13,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:44:16,845][00497] Updated weights for policy 0, policy_version 14329 (0.0020) [2024-03-29 13:44:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 234848256. Throughput: 0: 40854.7. Samples: 116985260. Policy #0 lag: (min: 0.0, avg: 24.7, max: 44.0) [2024-03-29 13:44:18,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:44:21,042][00497] Updated weights for policy 0, policy_version 14339 (0.0026) [2024-03-29 13:44:23,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41820.9). Total num frames: 235028480. Throughput: 0: 41174.3. Samples: 117254700. Policy #0 lag: (min: 0.0, avg: 24.7, max: 44.0) [2024-03-29 13:44:23,841][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:44:24,891][00497] Updated weights for policy 0, policy_version 14349 (0.0026) [2024-03-29 13:44:28,447][00497] Updated weights for policy 0, policy_version 14359 (0.0023) [2024-03-29 13:44:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 235274240. Throughput: 0: 41770.2. Samples: 117495500. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:44:28,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:44:32,526][00497] Updated weights for policy 0, policy_version 14369 (0.0034) [2024-03-29 13:44:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 235454464. Throughput: 0: 40855.2. Samples: 117600220. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:44:33,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:44:36,938][00497] Updated weights for policy 0, policy_version 14379 (0.0025) [2024-03-29 13:44:38,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40687.0, 300 sec: 41765.3). Total num frames: 235651072. Throughput: 0: 41281.8. Samples: 117873760. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:44:38,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:44:40,821][00497] Updated weights for policy 0, policy_version 14389 (0.0020) [2024-03-29 13:44:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 40960.0, 300 sec: 41820.9). Total num frames: 235880448. Throughput: 0: 41750.8. Samples: 118108860. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:44:43,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:44:44,199][00497] Updated weights for policy 0, policy_version 14399 (0.0025) [2024-03-29 13:44:48,529][00497] Updated weights for policy 0, policy_version 14409 (0.0019) [2024-03-29 13:44:48,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 236077056. Throughput: 0: 41160.0. Samples: 118222080. Policy #0 lag: (min: 0.0, avg: 21.1, max: 40.0) [2024-03-29 13:44:48,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:44:50,928][00476] Signal inference workers to stop experience collection... (4250 times) [2024-03-29 13:44:50,968][00497] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-03-29 13:44:51,124][00476] Signal inference workers to resume experience collection... (4250 times) [2024-03-29 13:44:51,124][00497] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-03-29 13:44:52,749][00497] Updated weights for policy 0, policy_version 14419 (0.0030) [2024-03-29 13:44:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 236273664. Throughput: 0: 41156.9. Samples: 118487920. Policy #0 lag: (min: 0.0, avg: 21.1, max: 40.0) [2024-03-29 13:44:53,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:44:56,586][00497] Updated weights for policy 0, policy_version 14429 (0.0027) [2024-03-29 13:44:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 236519424. Throughput: 0: 41552.8. Samples: 118739580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:44:58,842][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 13:44:59,748][00497] Updated weights for policy 0, policy_version 14439 (0.0020) [2024-03-29 13:45:03,839][00126] Fps is (10 sec: 44235.8, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 236716032. Throughput: 0: 41741.1. Samples: 118863620. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:45:03,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 13:45:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014448_236716032.pth... [2024-03-29 13:45:04,166][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013838_226721792.pth [2024-03-29 13:45:04,440][00497] Updated weights for policy 0, policy_version 14449 (0.0019) [2024-03-29 13:45:08,068][00497] Updated weights for policy 0, policy_version 14459 (0.0020) [2024-03-29 13:45:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 236912640. Throughput: 0: 41434.2. Samples: 119119240. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:45:08,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:45:12,085][00497] Updated weights for policy 0, policy_version 14469 (0.0023) [2024-03-29 13:45:13,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 237142016. Throughput: 0: 41845.3. Samples: 119378540. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:45:13,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 13:45:15,395][00497] Updated weights for policy 0, policy_version 14479 (0.0025) [2024-03-29 13:45:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 237355008. Throughput: 0: 42059.0. Samples: 119492880. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:45:18,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:45:19,677][00497] Updated weights for policy 0, policy_version 14489 (0.0022) [2024-03-29 13:45:23,764][00497] Updated weights for policy 0, policy_version 14499 (0.0023) [2024-03-29 13:45:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 237551616. Throughput: 0: 41651.1. Samples: 119748060. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:45:23,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:45:27,881][00497] Updated weights for policy 0, policy_version 14509 (0.0028) [2024-03-29 13:45:28,429][00476] Signal inference workers to stop experience collection... (4300 times) [2024-03-29 13:45:28,434][00476] Signal inference workers to resume experience collection... (4300 times) [2024-03-29 13:45:28,481][00497] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-03-29 13:45:28,482][00497] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-03-29 13:45:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 237764608. Throughput: 0: 42400.4. Samples: 120016880. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:45:28,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:45:31,194][00497] Updated weights for policy 0, policy_version 14519 (0.0027) [2024-03-29 13:45:33,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 41876.4). Total num frames: 238010368. Throughput: 0: 42299.4. Samples: 120125560. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:45:33,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 13:45:35,264][00497] Updated weights for policy 0, policy_version 14529 (0.0028) [2024-03-29 13:45:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41932.0). Total num frames: 238190592. Throughput: 0: 42145.3. Samples: 120384460. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:45:38,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:45:39,445][00497] Updated weights for policy 0, policy_version 14539 (0.0022) [2024-03-29 13:45:43,611][00497] Updated weights for policy 0, policy_version 14549 (0.0023) [2024-03-29 13:45:43,839][00126] Fps is (10 sec: 36045.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 238370816. Throughput: 0: 42106.7. Samples: 120634380. Policy #0 lag: (min: 0.0, avg: 20.2, max: 43.0) [2024-03-29 13:45:43,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:45:47,061][00497] Updated weights for policy 0, policy_version 14559 (0.0025) [2024-03-29 13:45:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 238632960. Throughput: 0: 41592.5. Samples: 120735280. Policy #0 lag: (min: 0.0, avg: 20.2, max: 43.0) [2024-03-29 13:45:48,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:45:51,264][00497] Updated weights for policy 0, policy_version 14569 (0.0023) [2024-03-29 13:45:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 238796800. Throughput: 0: 41782.1. Samples: 120999440. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 13:45:53,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:45:55,458][00497] Updated weights for policy 0, policy_version 14579 (0.0022) [2024-03-29 13:45:58,839][00126] Fps is (10 sec: 34406.7, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 238977024. Throughput: 0: 41460.9. Samples: 121244280. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 13:45:58,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 13:45:59,579][00497] Updated weights for policy 0, policy_version 14589 (0.0026) [2024-03-29 13:45:59,615][00476] Signal inference workers to stop experience collection... (4350 times) [2024-03-29 13:45:59,633][00497] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-03-29 13:45:59,825][00476] Signal inference workers to resume experience collection... (4350 times) [2024-03-29 13:45:59,826][00497] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-03-29 13:46:02,780][00497] Updated weights for policy 0, policy_version 14599 (0.0024) [2024-03-29 13:46:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 239239168. Throughput: 0: 41591.6. Samples: 121364500. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 13:46:03,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:46:07,134][00497] Updated weights for policy 0, policy_version 14609 (0.0021) [2024-03-29 13:46:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 239419392. Throughput: 0: 41646.2. Samples: 121622140. Policy #0 lag: (min: 1.0, avg: 23.6, max: 43.0) [2024-03-29 13:46:08,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:46:10,994][00497] Updated weights for policy 0, policy_version 14619 (0.0032) [2024-03-29 13:46:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 239616000. Throughput: 0: 41352.9. Samples: 121877760. Policy #0 lag: (min: 1.0, avg: 23.6, max: 43.0) [2024-03-29 13:46:13,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:46:15,545][00497] Updated weights for policy 0, policy_version 14629 (0.0029) [2024-03-29 13:46:18,714][00497] Updated weights for policy 0, policy_version 14639 (0.0021) [2024-03-29 13:46:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 239845376. Throughput: 0: 41320.6. Samples: 121984980. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 13:46:18,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 13:46:22,898][00497] Updated weights for policy 0, policy_version 14649 (0.0025) [2024-03-29 13:46:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 240025600. Throughput: 0: 41013.3. Samples: 122230060. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 13:46:23,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 13:46:26,815][00497] Updated weights for policy 0, policy_version 14659 (0.0024) [2024-03-29 13:46:28,624][00476] Signal inference workers to stop experience collection... (4400 times) [2024-03-29 13:46:28,686][00497] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-03-29 13:46:28,787][00476] Signal inference workers to resume experience collection... (4400 times) [2024-03-29 13:46:28,788][00497] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-03-29 13:46:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 240238592. Throughput: 0: 41155.5. Samples: 122486380. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:46:28,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:46:31,253][00497] Updated weights for policy 0, policy_version 14669 (0.0023) [2024-03-29 13:46:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 240467968. Throughput: 0: 41924.4. Samples: 122621880. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:46:33,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:46:34,368][00497] Updated weights for policy 0, policy_version 14679 (0.0023) [2024-03-29 13:46:38,685][00497] Updated weights for policy 0, policy_version 14689 (0.0019) [2024-03-29 13:46:38,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 240664576. Throughput: 0: 40900.2. Samples: 122839940. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:46:38,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:46:42,813][00497] Updated weights for policy 0, policy_version 14699 (0.0019) [2024-03-29 13:46:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 240877568. Throughput: 0: 41404.0. Samples: 123107460. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:46:43,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:46:46,976][00497] Updated weights for policy 0, policy_version 14709 (0.0019) [2024-03-29 13:46:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 40687.0, 300 sec: 41543.2). Total num frames: 241074176. Throughput: 0: 41686.2. Samples: 123240380. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:46:48,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:46:50,217][00497] Updated weights for policy 0, policy_version 14719 (0.0036) [2024-03-29 13:46:53,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 241287168. Throughput: 0: 40847.4. Samples: 123460280. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 13:46:53,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:46:54,645][00497] Updated weights for policy 0, policy_version 14729 (0.0022) [2024-03-29 13:46:57,953][00476] Signal inference workers to stop experience collection... (4450 times) [2024-03-29 13:46:58,021][00497] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-03-29 13:46:58,025][00476] Signal inference workers to resume experience collection... (4450 times) [2024-03-29 13:46:58,048][00497] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-03-29 13:46:58,331][00497] Updated weights for policy 0, policy_version 14739 (0.0035) [2024-03-29 13:46:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 241483776. Throughput: 0: 41215.6. Samples: 123732460. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 13:46:58,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:47:02,416][00497] Updated weights for policy 0, policy_version 14749 (0.0028) [2024-03-29 13:47:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 241696768. Throughput: 0: 41735.9. Samples: 123863100. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:47:03,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 13:47:03,927][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014753_241713152.pth... [2024-03-29 13:47:04,240][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014145_231751680.pth [2024-03-29 13:47:04,258][00476] Saving new best policy, reward=0.524! [2024-03-29 13:47:06,196][00497] Updated weights for policy 0, policy_version 14759 (0.0026) [2024-03-29 13:47:08,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 241942528. Throughput: 0: 41172.7. Samples: 124082840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:47:08,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:47:10,594][00497] Updated weights for policy 0, policy_version 14769 (0.0029) [2024-03-29 13:47:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 242106368. Throughput: 0: 41241.0. Samples: 124342220. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 13:47:13,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 13:47:14,367][00497] Updated weights for policy 0, policy_version 14779 (0.0029) [2024-03-29 13:47:18,457][00497] Updated weights for policy 0, policy_version 14789 (0.0024) [2024-03-29 13:47:18,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 242319360. Throughput: 0: 41283.2. Samples: 124479620. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 13:47:18,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:47:21,847][00497] Updated weights for policy 0, policy_version 14799 (0.0032) [2024-03-29 13:47:23,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 242565120. Throughput: 0: 41627.4. Samples: 124713180. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 13:47:23,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:47:26,249][00497] Updated weights for policy 0, policy_version 14809 (0.0027) [2024-03-29 13:47:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 242745344. Throughput: 0: 41628.9. Samples: 124980760. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 13:47:28,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 13:47:30,054][00497] Updated weights for policy 0, policy_version 14819 (0.0025) [2024-03-29 13:47:30,061][00476] Signal inference workers to stop experience collection... (4500 times) [2024-03-29 13:47:30,062][00476] Signal inference workers to resume experience collection... (4500 times) [2024-03-29 13:47:30,097][00497] InferenceWorker_p0-w0: stopping experience collection (4500 times) [2024-03-29 13:47:30,098][00497] InferenceWorker_p0-w0: resuming experience collection (4500 times) [2024-03-29 13:47:33,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 242941952. Throughput: 0: 41252.0. Samples: 125096720. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:47:33,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 13:47:34,128][00497] Updated weights for policy 0, policy_version 14829 (0.0029) [2024-03-29 13:47:37,370][00497] Updated weights for policy 0, policy_version 14839 (0.0025) [2024-03-29 13:47:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41654.3). Total num frames: 243187712. Throughput: 0: 42089.0. Samples: 125354280. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:47:38,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:47:41,842][00497] Updated weights for policy 0, policy_version 14849 (0.0020) [2024-03-29 13:47:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 243367936. Throughput: 0: 41813.8. Samples: 125614080. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:47:43,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:47:45,635][00497] Updated weights for policy 0, policy_version 14859 (0.0026) [2024-03-29 13:47:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 243580928. Throughput: 0: 41537.0. Samples: 125732260. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:47:48,841][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 13:47:49,608][00497] Updated weights for policy 0, policy_version 14869 (0.0022) [2024-03-29 13:47:52,880][00497] Updated weights for policy 0, policy_version 14879 (0.0034) [2024-03-29 13:47:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 243826688. Throughput: 0: 42354.4. Samples: 125988780. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:47:53,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 13:47:57,379][00497] Updated weights for policy 0, policy_version 14889 (0.0030) [2024-03-29 13:47:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 243990528. Throughput: 0: 42220.4. Samples: 126242140. Policy #0 lag: (min: 1.0, avg: 24.4, max: 43.0) [2024-03-29 13:47:58,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:48:01,201][00497] Updated weights for policy 0, policy_version 14899 (0.0018) [2024-03-29 13:48:03,678][00476] Signal inference workers to stop experience collection... (4550 times) [2024-03-29 13:48:03,717][00497] InferenceWorker_p0-w0: stopping experience collection (4550 times) [2024-03-29 13:48:03,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 244203520. Throughput: 0: 42017.7. Samples: 126370420. Policy #0 lag: (min: 1.0, avg: 24.4, max: 43.0) [2024-03-29 13:48:03,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 13:48:03,890][00476] Signal inference workers to resume experience collection... (4550 times) [2024-03-29 13:48:03,890][00497] InferenceWorker_p0-w0: resuming experience collection (4550 times) [2024-03-29 13:48:05,081][00497] Updated weights for policy 0, policy_version 14909 (0.0018) [2024-03-29 13:48:08,435][00497] Updated weights for policy 0, policy_version 14919 (0.0030) [2024-03-29 13:48:08,839][00126] Fps is (10 sec: 45875.5, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 244449280. Throughput: 0: 42589.9. Samples: 126629720. Policy #0 lag: (min: 0.0, avg: 19.3, max: 40.0) [2024-03-29 13:48:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:48:12,971][00497] Updated weights for policy 0, policy_version 14929 (0.0027) [2024-03-29 13:48:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 244629504. Throughput: 0: 42009.6. Samples: 126871200. Policy #0 lag: (min: 0.0, avg: 19.3, max: 40.0) [2024-03-29 13:48:13,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 13:48:16,854][00497] Updated weights for policy 0, policy_version 14939 (0.0022) [2024-03-29 13:48:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 244842496. Throughput: 0: 42062.2. Samples: 126989520. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 13:48:18,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:48:20,748][00497] Updated weights for policy 0, policy_version 14949 (0.0019) [2024-03-29 13:48:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 245071872. Throughput: 0: 42221.3. Samples: 127254240. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 13:48:23,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:48:24,094][00497] Updated weights for policy 0, policy_version 14959 (0.0039) [2024-03-29 13:48:28,403][00497] Updated weights for policy 0, policy_version 14969 (0.0025) [2024-03-29 13:48:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 245268480. Throughput: 0: 42017.8. Samples: 127504880. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 13:48:28,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:48:32,143][00497] Updated weights for policy 0, policy_version 14979 (0.0023) [2024-03-29 13:48:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 41598.7). Total num frames: 245481472. Throughput: 0: 42031.1. Samples: 127623660. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:48:33,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 13:48:36,457][00497] Updated weights for policy 0, policy_version 14989 (0.0021) [2024-03-29 13:48:38,093][00476] Signal inference workers to stop experience collection... (4600 times) [2024-03-29 13:48:38,172][00497] InferenceWorker_p0-w0: stopping experience collection (4600 times) [2024-03-29 13:48:38,259][00476] Signal inference workers to resume experience collection... (4600 times) [2024-03-29 13:48:38,259][00497] InferenceWorker_p0-w0: resuming experience collection (4600 times) [2024-03-29 13:48:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 245694464. Throughput: 0: 42260.9. Samples: 127890520. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:48:38,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:48:39,693][00497] Updated weights for policy 0, policy_version 14999 (0.0034) [2024-03-29 13:48:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 245891072. Throughput: 0: 41896.9. Samples: 128127500. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:48:43,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:48:43,959][00497] Updated weights for policy 0, policy_version 15009 (0.0023) [2024-03-29 13:48:47,813][00497] Updated weights for policy 0, policy_version 15019 (0.0022) [2024-03-29 13:48:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 246120448. Throughput: 0: 42117.3. Samples: 128265700. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:48:48,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:48:52,074][00497] Updated weights for policy 0, policy_version 15029 (0.0022) [2024-03-29 13:48:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 246300672. Throughput: 0: 42047.1. Samples: 128521840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 13:48:53,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:48:55,262][00497] Updated weights for policy 0, policy_version 15039 (0.0022) [2024-03-29 13:48:58,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 246530048. Throughput: 0: 41717.8. Samples: 128748500. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 13:48:58,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:48:59,684][00497] Updated weights for policy 0, policy_version 15049 (0.0028) [2024-03-29 13:49:03,425][00497] Updated weights for policy 0, policy_version 15059 (0.0022) [2024-03-29 13:49:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 246743040. Throughput: 0: 42279.1. Samples: 128892080. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:49:03,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:49:04,056][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015061_246759424.pth... [2024-03-29 13:49:04,375][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014448_236716032.pth [2024-03-29 13:49:07,568][00497] Updated weights for policy 0, policy_version 15069 (0.0031) [2024-03-29 13:49:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 246939648. Throughput: 0: 42218.1. Samples: 129154060. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:49:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:49:09,985][00476] Signal inference workers to stop experience collection... (4650 times) [2024-03-29 13:49:10,005][00497] InferenceWorker_p0-w0: stopping experience collection (4650 times) [2024-03-29 13:49:10,196][00476] Signal inference workers to resume experience collection... (4650 times) [2024-03-29 13:49:10,197][00497] InferenceWorker_p0-w0: resuming experience collection (4650 times) [2024-03-29 13:49:10,966][00497] Updated weights for policy 0, policy_version 15079 (0.0023) [2024-03-29 13:49:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 41820.8). Total num frames: 247185408. Throughput: 0: 41733.3. Samples: 129382880. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:49:13,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 13:49:15,228][00497] Updated weights for policy 0, policy_version 15089 (0.0022) [2024-03-29 13:49:18,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 247365632. Throughput: 0: 42150.7. Samples: 129520440. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:49:18,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:49:18,873][00497] Updated weights for policy 0, policy_version 15099 (0.0022) [2024-03-29 13:49:23,229][00497] Updated weights for policy 0, policy_version 15109 (0.0027) [2024-03-29 13:49:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 247578624. Throughput: 0: 41966.1. Samples: 129779000. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:49:23,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:49:26,360][00497] Updated weights for policy 0, policy_version 15119 (0.0020) [2024-03-29 13:49:28,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 247824384. Throughput: 0: 42034.6. Samples: 130019060. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 13:49:28,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:49:30,489][00497] Updated weights for policy 0, policy_version 15129 (0.0021) [2024-03-29 13:49:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 248020992. Throughput: 0: 42052.5. Samples: 130158060. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 13:49:33,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:49:34,403][00497] Updated weights for policy 0, policy_version 15139 (0.0029) [2024-03-29 13:49:38,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 248184832. Throughput: 0: 41781.8. Samples: 130402020. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 13:49:38,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:49:38,974][00497] Updated weights for policy 0, policy_version 15149 (0.0032) [2024-03-29 13:49:42,126][00497] Updated weights for policy 0, policy_version 15159 (0.0029) [2024-03-29 13:49:43,470][00476] Signal inference workers to stop experience collection... (4700 times) [2024-03-29 13:49:43,540][00497] InferenceWorker_p0-w0: stopping experience collection (4700 times) [2024-03-29 13:49:43,556][00476] Signal inference workers to resume experience collection... (4700 times) [2024-03-29 13:49:43,573][00497] InferenceWorker_p0-w0: resuming experience collection (4700 times) [2024-03-29 13:49:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 248430592. Throughput: 0: 41986.7. Samples: 130637900. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 13:49:43,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:49:46,492][00497] Updated weights for policy 0, policy_version 15169 (0.0021) [2024-03-29 13:49:48,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 248627200. Throughput: 0: 41726.5. Samples: 130769780. Policy #0 lag: (min: 2.0, avg: 20.5, max: 43.0) [2024-03-29 13:49:48,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:49:50,180][00497] Updated weights for policy 0, policy_version 15179 (0.0030) [2024-03-29 13:49:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 248823808. Throughput: 0: 41606.8. Samples: 131026360. Policy #0 lag: (min: 2.0, avg: 20.5, max: 43.0) [2024-03-29 13:49:53,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:49:54,482][00497] Updated weights for policy 0, policy_version 15189 (0.0023) [2024-03-29 13:49:57,561][00497] Updated weights for policy 0, policy_version 15199 (0.0027) [2024-03-29 13:49:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 249069568. Throughput: 0: 42013.9. Samples: 131273500. Policy #0 lag: (min: 2.0, avg: 20.5, max: 43.0) [2024-03-29 13:49:58,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 13:50:02,088][00497] Updated weights for policy 0, policy_version 15209 (0.0023) [2024-03-29 13:50:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 249266176. Throughput: 0: 41875.9. Samples: 131404860. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 13:50:03,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:50:05,658][00497] Updated weights for policy 0, policy_version 15219 (0.0027) [2024-03-29 13:50:08,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 249446400. Throughput: 0: 41689.3. Samples: 131655020. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 13:50:08,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:50:09,899][00497] Updated weights for policy 0, policy_version 15229 (0.0026) [2024-03-29 13:50:13,402][00497] Updated weights for policy 0, policy_version 15239 (0.0034) [2024-03-29 13:50:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 249692160. Throughput: 0: 41903.6. Samples: 131904720. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:50:13,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:50:13,894][00476] Signal inference workers to stop experience collection... (4750 times) [2024-03-29 13:50:13,936][00497] InferenceWorker_p0-w0: stopping experience collection (4750 times) [2024-03-29 13:50:14,104][00476] Signal inference workers to resume experience collection... (4750 times) [2024-03-29 13:50:14,104][00497] InferenceWorker_p0-w0: resuming experience collection (4750 times) [2024-03-29 13:50:17,812][00497] Updated weights for policy 0, policy_version 15249 (0.0027) [2024-03-29 13:50:18,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 249872384. Throughput: 0: 41407.6. Samples: 132021400. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:50:18,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:50:21,236][00497] Updated weights for policy 0, policy_version 15259 (0.0027) [2024-03-29 13:50:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 250085376. Throughput: 0: 41623.9. Samples: 132275100. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 13:50:23,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:50:25,727][00497] Updated weights for policy 0, policy_version 15269 (0.0024) [2024-03-29 13:50:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 250314752. Throughput: 0: 42300.4. Samples: 132541420. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 13:50:28,841][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:50:29,034][00497] Updated weights for policy 0, policy_version 15279 (0.0032) [2024-03-29 13:50:33,095][00497] Updated weights for policy 0, policy_version 15289 (0.0031) [2024-03-29 13:50:33,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 250511360. Throughput: 0: 42017.1. Samples: 132660540. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 13:50:33,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 13:50:36,798][00497] Updated weights for policy 0, policy_version 15299 (0.0023) [2024-03-29 13:50:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 250724352. Throughput: 0: 41780.8. Samples: 132906500. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:50:38,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:50:41,499][00497] Updated weights for policy 0, policy_version 15309 (0.0022) [2024-03-29 13:50:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 250937344. Throughput: 0: 42348.0. Samples: 133179160. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:50:43,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:50:44,592][00497] Updated weights for policy 0, policy_version 15319 (0.0028) [2024-03-29 13:50:46,133][00476] Signal inference workers to stop experience collection... (4800 times) [2024-03-29 13:50:46,206][00476] Signal inference workers to resume experience collection... (4800 times) [2024-03-29 13:50:46,210][00497] InferenceWorker_p0-w0: stopping experience collection (4800 times) [2024-03-29 13:50:46,233][00497] InferenceWorker_p0-w0: resuming experience collection (4800 times) [2024-03-29 13:50:48,800][00497] Updated weights for policy 0, policy_version 15329 (0.0018) [2024-03-29 13:50:48,840][00126] Fps is (10 sec: 42597.3, 60 sec: 42052.1, 300 sec: 41876.4). Total num frames: 251150336. Throughput: 0: 41789.0. Samples: 133285380. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 13:50:48,841][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:50:52,239][00497] Updated weights for policy 0, policy_version 15339 (0.0031) [2024-03-29 13:50:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 251379712. Throughput: 0: 41971.2. Samples: 133543720. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 13:50:53,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:50:56,745][00497] Updated weights for policy 0, policy_version 15349 (0.0026) [2024-03-29 13:50:58,839][00126] Fps is (10 sec: 40961.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 251559936. Throughput: 0: 42467.1. Samples: 133815740. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 13:50:58,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 13:51:00,007][00497] Updated weights for policy 0, policy_version 15359 (0.0020) [2024-03-29 13:51:03,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 251789312. Throughput: 0: 42395.9. Samples: 133929220. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 13:51:03,840][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 13:51:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015368_251789312.pth... [2024-03-29 13:51:04,177][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014753_241713152.pth [2024-03-29 13:51:04,447][00497] Updated weights for policy 0, policy_version 15369 (0.0019) [2024-03-29 13:51:07,753][00497] Updated weights for policy 0, policy_version 15379 (0.0023) [2024-03-29 13:51:08,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 252018688. Throughput: 0: 42334.3. Samples: 134180140. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 13:51:08,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:51:12,216][00497] Updated weights for policy 0, policy_version 15389 (0.0022) [2024-03-29 13:51:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 252182528. Throughput: 0: 42346.1. Samples: 134447000. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 13:51:13,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 13:51:15,866][00497] Updated weights for policy 0, policy_version 15399 (0.0030) [2024-03-29 13:51:17,081][00476] Signal inference workers to stop experience collection... (4850 times) [2024-03-29 13:51:17,083][00476] Signal inference workers to resume experience collection... (4850 times) [2024-03-29 13:51:17,107][00497] InferenceWorker_p0-w0: stopping experience collection (4850 times) [2024-03-29 13:51:17,126][00497] InferenceWorker_p0-w0: resuming experience collection (4850 times) [2024-03-29 13:51:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 252428288. Throughput: 0: 41975.1. Samples: 134549420. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 13:51:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:51:19,974][00497] Updated weights for policy 0, policy_version 15409 (0.0023) [2024-03-29 13:51:23,406][00497] Updated weights for policy 0, policy_version 15419 (0.0029) [2024-03-29 13:51:23,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 252641280. Throughput: 0: 42361.4. Samples: 134812760. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 13:51:23,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 13:51:28,072][00497] Updated weights for policy 0, policy_version 15429 (0.0023) [2024-03-29 13:51:28,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 252805120. Throughput: 0: 42132.4. Samples: 135075120. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 13:51:28,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:51:31,437][00497] Updated weights for policy 0, policy_version 15439 (0.0029) [2024-03-29 13:51:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 253067264. Throughput: 0: 42070.9. Samples: 135178560. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:51:33,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 13:51:35,800][00497] Updated weights for policy 0, policy_version 15449 (0.0024) [2024-03-29 13:51:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 253247488. Throughput: 0: 41954.2. Samples: 135431660. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:51:38,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 13:51:39,261][00497] Updated weights for policy 0, policy_version 15459 (0.0022) [2024-03-29 13:51:43,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 253427712. Throughput: 0: 41540.0. Samples: 135685040. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 13:51:43,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 13:51:44,098][00497] Updated weights for policy 0, policy_version 15469 (0.0026) [2024-03-29 13:51:47,310][00497] Updated weights for policy 0, policy_version 15479 (0.0023) [2024-03-29 13:51:48,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.5, 300 sec: 41987.5). Total num frames: 253673472. Throughput: 0: 41794.3. Samples: 135809960. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 13:51:48,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:51:49,271][00476] Signal inference workers to stop experience collection... (4900 times) [2024-03-29 13:51:49,323][00497] InferenceWorker_p0-w0: stopping experience collection (4900 times) [2024-03-29 13:51:49,360][00476] Signal inference workers to resume experience collection... (4900 times) [2024-03-29 13:51:49,367][00497] InferenceWorker_p0-w0: resuming experience collection (4900 times) [2024-03-29 13:51:51,948][00497] Updated weights for policy 0, policy_version 15489 (0.0030) [2024-03-29 13:51:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 253853696. Throughput: 0: 41683.7. Samples: 136055900. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 13:51:53,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:51:55,219][00497] Updated weights for policy 0, policy_version 15499 (0.0024) [2024-03-29 13:51:58,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 254033920. Throughput: 0: 41174.4. Samples: 136299840. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:51:58,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:52:00,103][00497] Updated weights for policy 0, policy_version 15509 (0.0026) [2024-03-29 13:52:03,302][00497] Updated weights for policy 0, policy_version 15519 (0.0019) [2024-03-29 13:52:03,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 254279680. Throughput: 0: 41786.2. Samples: 136429800. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:52:03,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:52:07,538][00497] Updated weights for policy 0, policy_version 15529 (0.0019) [2024-03-29 13:52:08,839][00126] Fps is (10 sec: 44235.8, 60 sec: 40959.9, 300 sec: 41931.9). Total num frames: 254476288. Throughput: 0: 41268.7. Samples: 136669860. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 13:52:08,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:52:11,224][00497] Updated weights for policy 0, policy_version 15539 (0.0027) [2024-03-29 13:52:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 254672896. Throughput: 0: 40867.1. Samples: 136914140. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 13:52:13,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 13:52:15,706][00497] Updated weights for policy 0, policy_version 15549 (0.0018) [2024-03-29 13:52:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41233.0, 300 sec: 41820.9). Total num frames: 254902272. Throughput: 0: 41714.7. Samples: 137055720. Policy #0 lag: (min: 1.0, avg: 21.4, max: 45.0) [2024-03-29 13:52:18,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:52:19,119][00497] Updated weights for policy 0, policy_version 15559 (0.0021) [2024-03-29 13:52:19,973][00476] Signal inference workers to stop experience collection... (4950 times) [2024-03-29 13:52:19,994][00497] InferenceWorker_p0-w0: stopping experience collection (4950 times) [2024-03-29 13:52:20,186][00476] Signal inference workers to resume experience collection... (4950 times) [2024-03-29 13:52:20,187][00497] InferenceWorker_p0-w0: resuming experience collection (4950 times) [2024-03-29 13:52:23,104][00497] Updated weights for policy 0, policy_version 15569 (0.0028) [2024-03-29 13:52:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 40959.9, 300 sec: 41876.4). Total num frames: 255098880. Throughput: 0: 41180.8. Samples: 137284800. Policy #0 lag: (min: 1.0, avg: 21.4, max: 45.0) [2024-03-29 13:52:23,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:52:27,030][00497] Updated weights for policy 0, policy_version 15579 (0.0025) [2024-03-29 13:52:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 255311872. Throughput: 0: 40959.1. Samples: 137528200. Policy #0 lag: (min: 1.0, avg: 21.4, max: 45.0) [2024-03-29 13:52:28,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:52:31,747][00497] Updated weights for policy 0, policy_version 15589 (0.0025) [2024-03-29 13:52:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40687.0, 300 sec: 41765.3). Total num frames: 255508480. Throughput: 0: 41270.6. Samples: 137667140. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:52:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:52:34,996][00497] Updated weights for policy 0, policy_version 15599 (0.0022) [2024-03-29 13:52:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 255721472. Throughput: 0: 40893.7. Samples: 137896120. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:52:38,841][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 13:52:38,917][00497] Updated weights for policy 0, policy_version 15609 (0.0033) [2024-03-29 13:52:42,784][00497] Updated weights for policy 0, policy_version 15619 (0.0025) [2024-03-29 13:52:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 255950848. Throughput: 0: 41011.9. Samples: 138145380. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:52:43,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:52:47,623][00497] Updated weights for policy 0, policy_version 15629 (0.0020) [2024-03-29 13:52:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40687.0, 300 sec: 41654.2). Total num frames: 256114688. Throughput: 0: 41336.1. Samples: 138289920. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:52:48,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:52:50,688][00497] Updated weights for policy 0, policy_version 15639 (0.0020) [2024-03-29 13:52:52,684][00476] Signal inference workers to stop experience collection... (5000 times) [2024-03-29 13:52:52,685][00476] Signal inference workers to resume experience collection... (5000 times) [2024-03-29 13:52:52,726][00497] InferenceWorker_p0-w0: stopping experience collection (5000 times) [2024-03-29 13:52:52,726][00497] InferenceWorker_p0-w0: resuming experience collection (5000 times) [2024-03-29 13:52:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 256344064. Throughput: 0: 41191.7. Samples: 138523480. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 13:52:53,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:52:54,916][00497] Updated weights for policy 0, policy_version 15649 (0.0020) [2024-03-29 13:52:58,638][00497] Updated weights for policy 0, policy_version 15659 (0.0024) [2024-03-29 13:52:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 256557056. Throughput: 0: 41462.7. Samples: 138779960. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 13:52:58,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:53:03,281][00497] Updated weights for policy 0, policy_version 15669 (0.0025) [2024-03-29 13:53:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 256737280. Throughput: 0: 41009.8. Samples: 138901160. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 13:53:03,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:53:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015670_256737280.pth... [2024-03-29 13:53:04,201][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015061_246759424.pth [2024-03-29 13:53:06,734][00497] Updated weights for policy 0, policy_version 15679 (0.0027) [2024-03-29 13:53:08,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 256983040. Throughput: 0: 41296.4. Samples: 139143140. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 13:53:08,842][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:53:10,950][00497] Updated weights for policy 0, policy_version 15689 (0.0026) [2024-03-29 13:53:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 257163264. Throughput: 0: 41738.6. Samples: 139406440. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 13:53:13,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 13:53:14,524][00497] Updated weights for policy 0, policy_version 15699 (0.0024) [2024-03-29 13:53:18,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 257359872. Throughput: 0: 41176.0. Samples: 139520060. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:53:18,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:53:18,934][00497] Updated weights for policy 0, policy_version 15709 (0.0018) [2024-03-29 13:53:22,419][00497] Updated weights for policy 0, policy_version 15719 (0.0024) [2024-03-29 13:53:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 257622016. Throughput: 0: 41799.0. Samples: 139777080. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:53:23,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:53:24,033][00476] Signal inference workers to stop experience collection... (5050 times) [2024-03-29 13:53:24,090][00497] InferenceWorker_p0-w0: stopping experience collection (5050 times) [2024-03-29 13:53:24,124][00476] Signal inference workers to resume experience collection... (5050 times) [2024-03-29 13:53:24,126][00497] InferenceWorker_p0-w0: resuming experience collection (5050 times) [2024-03-29 13:53:26,266][00497] Updated weights for policy 0, policy_version 15729 (0.0034) [2024-03-29 13:53:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 257785856. Throughput: 0: 41890.7. Samples: 140030460. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:53:28,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 13:53:30,094][00497] Updated weights for policy 0, policy_version 15739 (0.0030) [2024-03-29 13:53:33,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 257982464. Throughput: 0: 41064.4. Samples: 140137820. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:53:33,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:53:34,780][00497] Updated weights for policy 0, policy_version 15749 (0.0020) [2024-03-29 13:53:38,356][00497] Updated weights for policy 0, policy_version 15759 (0.0033) [2024-03-29 13:53:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 258211840. Throughput: 0: 41583.2. Samples: 140394720. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:53:38,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 13:53:42,290][00497] Updated weights for policy 0, policy_version 15769 (0.0019) [2024-03-29 13:53:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 258392064. Throughput: 0: 41500.0. Samples: 140647460. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:53:43,841][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 13:53:46,128][00497] Updated weights for policy 0, policy_version 15779 (0.0022) [2024-03-29 13:53:48,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 258605056. Throughput: 0: 41239.1. Samples: 140756920. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:53:48,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:53:50,864][00497] Updated weights for policy 0, policy_version 15789 (0.0022) [2024-03-29 13:53:53,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 258834432. Throughput: 0: 41758.7. Samples: 141022280. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:53:53,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:53:54,134][00497] Updated weights for policy 0, policy_version 15799 (0.0026) [2024-03-29 13:53:55,428][00476] Signal inference workers to stop experience collection... (5100 times) [2024-03-29 13:53:55,461][00497] InferenceWorker_p0-w0: stopping experience collection (5100 times) [2024-03-29 13:53:55,641][00476] Signal inference workers to resume experience collection... (5100 times) [2024-03-29 13:53:55,642][00497] InferenceWorker_p0-w0: resuming experience collection (5100 times) [2024-03-29 13:53:58,074][00497] Updated weights for policy 0, policy_version 15809 (0.0018) [2024-03-29 13:53:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 259031040. Throughput: 0: 41456.5. Samples: 141271980. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:53:58,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:54:01,720][00497] Updated weights for policy 0, policy_version 15819 (0.0020) [2024-03-29 13:54:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 259244032. Throughput: 0: 41744.0. Samples: 141398540. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 13:54:03,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 13:54:06,174][00497] Updated weights for policy 0, policy_version 15829 (0.0021) [2024-03-29 13:54:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 259457024. Throughput: 0: 41906.7. Samples: 141662880. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 13:54:08,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 13:54:09,697][00497] Updated weights for policy 0, policy_version 15839 (0.0022) [2024-03-29 13:54:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 259653632. Throughput: 0: 41468.4. Samples: 141896540. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 13:54:13,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 13:54:14,103][00497] Updated weights for policy 0, policy_version 15849 (0.0021) [2024-03-29 13:54:17,580][00497] Updated weights for policy 0, policy_version 15859 (0.0019) [2024-03-29 13:54:18,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 259883008. Throughput: 0: 41936.3. Samples: 142024960. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:54:18,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 13:54:21,715][00497] Updated weights for policy 0, policy_version 15869 (0.0018) [2024-03-29 13:54:23,839][00126] Fps is (10 sec: 42598.7, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 260079616. Throughput: 0: 42042.2. Samples: 142286620. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:54:23,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 13:54:25,498][00497] Updated weights for policy 0, policy_version 15879 (0.0029) [2024-03-29 13:54:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 260292608. Throughput: 0: 41800.8. Samples: 142528500. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:54:28,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:54:29,423][00497] Updated weights for policy 0, policy_version 15889 (0.0034) [2024-03-29 13:54:31,986][00476] Signal inference workers to stop experience collection... (5150 times) [2024-03-29 13:54:32,020][00497] InferenceWorker_p0-w0: stopping experience collection (5150 times) [2024-03-29 13:54:32,199][00476] Signal inference workers to resume experience collection... (5150 times) [2024-03-29 13:54:32,200][00497] InferenceWorker_p0-w0: resuming experience collection (5150 times) [2024-03-29 13:54:33,014][00497] Updated weights for policy 0, policy_version 15899 (0.0024) [2024-03-29 13:54:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 260521984. Throughput: 0: 42199.1. Samples: 142655880. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:54:33,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:54:37,572][00497] Updated weights for policy 0, policy_version 15909 (0.0031) [2024-03-29 13:54:38,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 260702208. Throughput: 0: 41958.7. Samples: 142910420. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:54:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:54:41,205][00497] Updated weights for policy 0, policy_version 15919 (0.0023) [2024-03-29 13:54:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 260915200. Throughput: 0: 41506.2. Samples: 143139760. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:54:43,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:54:45,360][00497] Updated weights for policy 0, policy_version 15929 (0.0023) [2024-03-29 13:54:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 261144576. Throughput: 0: 41708.1. Samples: 143275400. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:54:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:54:48,840][00497] Updated weights for policy 0, policy_version 15939 (0.0028) [2024-03-29 13:54:53,482][00497] Updated weights for policy 0, policy_version 15949 (0.0018) [2024-03-29 13:54:53,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 261308416. Throughput: 0: 41475.6. Samples: 143529280. Policy #0 lag: (min: 2.0, avg: 21.9, max: 42.0) [2024-03-29 13:54:53,840][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 13:54:56,871][00497] Updated weights for policy 0, policy_version 15959 (0.0029) [2024-03-29 13:54:58,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 261554176. Throughput: 0: 41502.7. Samples: 143764160. Policy #0 lag: (min: 2.0, avg: 21.9, max: 42.0) [2024-03-29 13:54:58,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:55:00,952][00497] Updated weights for policy 0, policy_version 15969 (0.0022) [2024-03-29 13:55:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 261750784. Throughput: 0: 41746.8. Samples: 143903560. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:55:03,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:55:04,187][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015978_261783552.pth... [2024-03-29 13:55:04,521][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015368_251789312.pth [2024-03-29 13:55:04,550][00476] Signal inference workers to stop experience collection... (5200 times) [2024-03-29 13:55:04,590][00497] InferenceWorker_p0-w0: stopping experience collection (5200 times) [2024-03-29 13:55:04,785][00476] Signal inference workers to resume experience collection... (5200 times) [2024-03-29 13:55:04,786][00497] InferenceWorker_p0-w0: resuming experience collection (5200 times) [2024-03-29 13:55:04,790][00497] Updated weights for policy 0, policy_version 15979 (0.0023) [2024-03-29 13:55:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 261947392. Throughput: 0: 41580.3. Samples: 144157740. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:55:08,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:55:09,025][00497] Updated weights for policy 0, policy_version 15989 (0.0021) [2024-03-29 13:55:12,495][00497] Updated weights for policy 0, policy_version 15999 (0.0024) [2024-03-29 13:55:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 262193152. Throughput: 0: 41526.6. Samples: 144397200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:55:13,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:55:16,407][00497] Updated weights for policy 0, policy_version 16009 (0.0026) [2024-03-29 13:55:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 262389760. Throughput: 0: 41800.3. Samples: 144536900. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:55:18,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:55:19,930][00497] Updated weights for policy 0, policy_version 16019 (0.0034) [2024-03-29 13:55:23,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 262569984. Throughput: 0: 41662.2. Samples: 144785220. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:55:23,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 13:55:24,695][00497] Updated weights for policy 0, policy_version 16029 (0.0018) [2024-03-29 13:55:27,983][00497] Updated weights for policy 0, policy_version 16039 (0.0022) [2024-03-29 13:55:28,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 262815744. Throughput: 0: 42198.3. Samples: 145038680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:55:28,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:55:31,995][00497] Updated weights for policy 0, policy_version 16049 (0.0019) [2024-03-29 13:55:33,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 263012352. Throughput: 0: 42057.6. Samples: 145168000. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:55:33,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 13:55:35,668][00497] Updated weights for policy 0, policy_version 16059 (0.0021) [2024-03-29 13:55:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 263208960. Throughput: 0: 41815.6. Samples: 145410980. Policy #0 lag: (min: 2.0, avg: 22.3, max: 41.0) [2024-03-29 13:55:38,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:55:40,336][00497] Updated weights for policy 0, policy_version 16069 (0.0021) [2024-03-29 13:55:41,458][00476] Signal inference workers to stop experience collection... (5250 times) [2024-03-29 13:55:41,524][00497] InferenceWorker_p0-w0: stopping experience collection (5250 times) [2024-03-29 13:55:41,620][00476] Signal inference workers to resume experience collection... (5250 times) [2024-03-29 13:55:41,620][00497] InferenceWorker_p0-w0: resuming experience collection (5250 times) [2024-03-29 13:55:43,610][00497] Updated weights for policy 0, policy_version 16079 (0.0029) [2024-03-29 13:55:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 263438336. Throughput: 0: 42227.6. Samples: 145664400. Policy #0 lag: (min: 2.0, avg: 22.3, max: 41.0) [2024-03-29 13:55:43,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:55:47,524][00497] Updated weights for policy 0, policy_version 16089 (0.0018) [2024-03-29 13:55:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 263651328. Throughput: 0: 42088.0. Samples: 145797520. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 13:55:48,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 13:55:51,176][00497] Updated weights for policy 0, policy_version 16099 (0.0020) [2024-03-29 13:55:53,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 263847936. Throughput: 0: 41996.5. Samples: 146047580. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 13:55:53,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:55:55,551][00497] Updated weights for policy 0, policy_version 16109 (0.0019) [2024-03-29 13:55:58,823][00497] Updated weights for policy 0, policy_version 16119 (0.0022) [2024-03-29 13:55:58,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 264093696. Throughput: 0: 42457.0. Samples: 146307760. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 13:55:58,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:56:02,829][00497] Updated weights for policy 0, policy_version 16129 (0.0022) [2024-03-29 13:56:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.4, 300 sec: 41598.7). Total num frames: 264290304. Throughput: 0: 42220.6. Samples: 146436820. Policy #0 lag: (min: 1.0, avg: 23.1, max: 43.0) [2024-03-29 13:56:03,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:56:06,467][00497] Updated weights for policy 0, policy_version 16139 (0.0021) [2024-03-29 13:56:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 264503296. Throughput: 0: 42552.8. Samples: 146700100. Policy #0 lag: (min: 1.0, avg: 23.1, max: 43.0) [2024-03-29 13:56:08,841][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:56:10,695][00497] Updated weights for policy 0, policy_version 16149 (0.0030) [2024-03-29 13:56:13,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 264732672. Throughput: 0: 42527.0. Samples: 146952400. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:56:13,840][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 13:56:14,085][00497] Updated weights for policy 0, policy_version 16159 (0.0019) [2024-03-29 13:56:14,574][00476] Signal inference workers to stop experience collection... (5300 times) [2024-03-29 13:56:14,607][00497] InferenceWorker_p0-w0: stopping experience collection (5300 times) [2024-03-29 13:56:14,767][00476] Signal inference workers to resume experience collection... (5300 times) [2024-03-29 13:56:14,768][00497] InferenceWorker_p0-w0: resuming experience collection (5300 times) [2024-03-29 13:56:18,304][00497] Updated weights for policy 0, policy_version 16169 (0.0027) [2024-03-29 13:56:18,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 264929280. Throughput: 0: 42240.9. Samples: 147068840. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:56:18,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 13:56:21,853][00497] Updated weights for policy 0, policy_version 16179 (0.0018) [2024-03-29 13:56:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42871.5, 300 sec: 41820.8). Total num frames: 265142272. Throughput: 0: 42822.6. Samples: 147338000. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:56:23,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:56:26,161][00497] Updated weights for policy 0, policy_version 16189 (0.0024) [2024-03-29 13:56:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.4, 300 sec: 41709.8). Total num frames: 265371648. Throughput: 0: 42874.6. Samples: 147593760. Policy #0 lag: (min: 0.0, avg: 20.7, max: 43.0) [2024-03-29 13:56:28,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:56:29,414][00497] Updated weights for policy 0, policy_version 16199 (0.0031) [2024-03-29 13:56:33,556][00497] Updated weights for policy 0, policy_version 16209 (0.0019) [2024-03-29 13:56:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42871.4, 300 sec: 41820.8). Total num frames: 265584640. Throughput: 0: 42555.9. Samples: 147712540. Policy #0 lag: (min: 0.0, avg: 20.7, max: 43.0) [2024-03-29 13:56:33,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:56:37,291][00497] Updated weights for policy 0, policy_version 16219 (0.0027) [2024-03-29 13:56:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 41931.9). Total num frames: 265797632. Throughput: 0: 42739.6. Samples: 147970860. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 13:56:38,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:56:41,695][00497] Updated weights for policy 0, policy_version 16229 (0.0023) [2024-03-29 13:56:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.3, 300 sec: 41765.3). Total num frames: 265994240. Throughput: 0: 42810.2. Samples: 148234220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 13:56:43,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:56:44,875][00497] Updated weights for policy 0, policy_version 16239 (0.0042) [2024-03-29 13:56:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 266207232. Throughput: 0: 42394.6. Samples: 148344580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:56:48,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 13:56:49,003][00497] Updated weights for policy 0, policy_version 16249 (0.0017) [2024-03-29 13:56:51,690][00476] Signal inference workers to stop experience collection... (5350 times) [2024-03-29 13:56:51,772][00497] InferenceWorker_p0-w0: stopping experience collection (5350 times) [2024-03-29 13:56:51,775][00476] Signal inference workers to resume experience collection... (5350 times) [2024-03-29 13:56:51,798][00497] InferenceWorker_p0-w0: resuming experience collection (5350 times) [2024-03-29 13:56:52,687][00497] Updated weights for policy 0, policy_version 16259 (0.0018) [2024-03-29 13:56:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 43144.6, 300 sec: 42043.0). Total num frames: 266436608. Throughput: 0: 42610.4. Samples: 148617560. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:56:53,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:56:56,831][00497] Updated weights for policy 0, policy_version 16269 (0.0023) [2024-03-29 13:56:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 266633216. Throughput: 0: 42832.9. Samples: 148879880. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:56:58,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:57:00,212][00497] Updated weights for policy 0, policy_version 16279 (0.0034) [2024-03-29 13:57:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.4, 300 sec: 41932.0). Total num frames: 266846208. Throughput: 0: 42513.3. Samples: 148981940. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 13:57:03,841][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:57:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016287_266846208.pth... [2024-03-29 13:57:04,196][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015670_256737280.pth [2024-03-29 13:57:04,723][00497] Updated weights for policy 0, policy_version 16289 (0.0019) [2024-03-29 13:57:08,290][00497] Updated weights for policy 0, policy_version 16299 (0.0020) [2024-03-29 13:57:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 267059200. Throughput: 0: 42432.0. Samples: 149247440. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 13:57:08,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:57:12,401][00497] Updated weights for policy 0, policy_version 16309 (0.0031) [2024-03-29 13:57:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 267255808. Throughput: 0: 42765.3. Samples: 149518200. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:57:13,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 13:57:15,810][00497] Updated weights for policy 0, policy_version 16319 (0.0025) [2024-03-29 13:57:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 267485184. Throughput: 0: 42308.0. Samples: 149616400. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:57:18,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 13:57:20,159][00497] Updated weights for policy 0, policy_version 16329 (0.0022) [2024-03-29 13:57:23,827][00497] Updated weights for policy 0, policy_version 16339 (0.0019) [2024-03-29 13:57:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 267698176. Throughput: 0: 42561.8. Samples: 149886140. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:57:23,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:57:27,926][00497] Updated weights for policy 0, policy_version 16349 (0.0022) [2024-03-29 13:57:28,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 267878400. Throughput: 0: 42497.4. Samples: 150146600. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:57:28,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:57:29,693][00476] Signal inference workers to stop experience collection... (5400 times) [2024-03-29 13:57:29,763][00497] InferenceWorker_p0-w0: stopping experience collection (5400 times) [2024-03-29 13:57:29,777][00476] Signal inference workers to resume experience collection... (5400 times) [2024-03-29 13:57:29,795][00497] InferenceWorker_p0-w0: resuming experience collection (5400 times) [2024-03-29 13:57:31,220][00497] Updated weights for policy 0, policy_version 16359 (0.0023) [2024-03-29 13:57:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 268124160. Throughput: 0: 42625.4. Samples: 150262720. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:57:33,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 13:57:35,560][00497] Updated weights for policy 0, policy_version 16369 (0.0026) [2024-03-29 13:57:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 268337152. Throughput: 0: 42250.6. Samples: 150518840. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:57:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:57:39,058][00497] Updated weights for policy 0, policy_version 16379 (0.0023) [2024-03-29 13:57:43,426][00497] Updated weights for policy 0, policy_version 16389 (0.0026) [2024-03-29 13:57:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 268517376. Throughput: 0: 42230.8. Samples: 150780260. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:57:43,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:57:46,733][00497] Updated weights for policy 0, policy_version 16399 (0.0024) [2024-03-29 13:57:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 268763136. Throughput: 0: 42523.4. Samples: 150895500. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:57:48,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:57:51,116][00497] Updated weights for policy 0, policy_version 16409 (0.0017) [2024-03-29 13:57:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 268959744. Throughput: 0: 42286.8. Samples: 151150340. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:57:53,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:57:54,564][00497] Updated weights for policy 0, policy_version 16419 (0.0018) [2024-03-29 13:57:58,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 269156352. Throughput: 0: 42028.1. Samples: 151409460. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:57:58,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:57:58,913][00497] Updated weights for policy 0, policy_version 16429 (0.0036) [2024-03-29 13:58:02,288][00476] Signal inference workers to stop experience collection... (5450 times) [2024-03-29 13:58:02,339][00497] InferenceWorker_p0-w0: stopping experience collection (5450 times) [2024-03-29 13:58:02,369][00476] Signal inference workers to resume experience collection... (5450 times) [2024-03-29 13:58:02,372][00497] InferenceWorker_p0-w0: resuming experience collection (5450 times) [2024-03-29 13:58:02,378][00497] Updated weights for policy 0, policy_version 16439 (0.0027) [2024-03-29 13:58:03,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 269402112. Throughput: 0: 42601.3. Samples: 151533460. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 13:58:03,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:58:06,542][00497] Updated weights for policy 0, policy_version 16449 (0.0018) [2024-03-29 13:58:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 269582336. Throughput: 0: 42197.8. Samples: 151785040. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 13:58:08,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:58:10,252][00497] Updated weights for policy 0, policy_version 16459 (0.0028) [2024-03-29 13:58:13,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 269795328. Throughput: 0: 42012.9. Samples: 152037180. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:58:13,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:58:14,682][00497] Updated weights for policy 0, policy_version 16469 (0.0027) [2024-03-29 13:58:17,972][00497] Updated weights for policy 0, policy_version 16479 (0.0028) [2024-03-29 13:58:18,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 270041088. Throughput: 0: 42295.6. Samples: 152166020. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:58:18,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:58:22,203][00497] Updated weights for policy 0, policy_version 16489 (0.0018) [2024-03-29 13:58:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 270221312. Throughput: 0: 42104.9. Samples: 152413560. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:58:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:58:25,896][00497] Updated weights for policy 0, policy_version 16499 (0.0023) [2024-03-29 13:58:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 270417920. Throughput: 0: 42048.1. Samples: 152672420. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:58:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:58:30,353][00497] Updated weights for policy 0, policy_version 16509 (0.0019) [2024-03-29 13:58:33,355][00497] Updated weights for policy 0, policy_version 16519 (0.0024) [2024-03-29 13:58:33,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 270663680. Throughput: 0: 42409.4. Samples: 152803920. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:58:33,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:58:37,399][00497] Updated weights for policy 0, policy_version 16529 (0.0017) [2024-03-29 13:58:38,839][00126] Fps is (10 sec: 45874.0, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 270876672. Throughput: 0: 42408.7. Samples: 153058740. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:58:39,959][00476] Signal inference workers to stop experience collection... (5500 times) [2024-03-29 13:58:39,989][00497] InferenceWorker_p0-w0: stopping experience collection (5500 times) [2024-03-29 13:58:40,171][00476] Signal inference workers to resume experience collection... (5500 times) [2024-03-29 13:58:40,171][00497] InferenceWorker_p0-w0: resuming experience collection (5500 times) [2024-03-29 13:58:41,182][00497] Updated weights for policy 0, policy_version 16539 (0.0019) [2024-03-29 13:58:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.3, 300 sec: 42265.1). Total num frames: 271073280. Throughput: 0: 42287.0. Samples: 153312380. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:43,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:58:45,532][00497] Updated weights for policy 0, policy_version 16549 (0.0017) [2024-03-29 13:58:48,638][00497] Updated weights for policy 0, policy_version 16559 (0.0023) [2024-03-29 13:58:48,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42325.5, 300 sec: 42265.2). Total num frames: 271302656. Throughput: 0: 42531.2. Samples: 153447360. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:48,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:58:52,884][00497] Updated weights for policy 0, policy_version 16569 (0.0032) [2024-03-29 13:58:53,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 271499264. Throughput: 0: 42358.2. Samples: 153691160. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:53,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 13:58:56,468][00497] Updated weights for policy 0, policy_version 16579 (0.0020) [2024-03-29 13:58:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 271712256. Throughput: 0: 42493.3. Samples: 153949380. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:58,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:59:00,663][00497] Updated weights for policy 0, policy_version 16589 (0.0021) [2024-03-29 13:59:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 271941632. Throughput: 0: 42707.5. Samples: 154087860. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 13:59:03,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 13:59:04,122][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016599_271958016.pth... [2024-03-29 13:59:04,137][00497] Updated weights for policy 0, policy_version 16599 (0.0027) [2024-03-29 13:59:04,469][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015978_261783552.pth [2024-03-29 13:59:08,263][00497] Updated weights for policy 0, policy_version 16609 (0.0024) [2024-03-29 13:59:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 272138240. Throughput: 0: 42427.2. Samples: 154322780. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 13:59:08,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:59:12,034][00497] Updated weights for policy 0, policy_version 16619 (0.0024) [2024-03-29 13:59:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 272351232. Throughput: 0: 42439.5. Samples: 154582200. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:59:13,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:59:16,227][00497] Updated weights for policy 0, policy_version 16629 (0.0023) [2024-03-29 13:59:18,074][00476] Signal inference workers to stop experience collection... (5550 times) [2024-03-29 13:59:18,105][00497] InferenceWorker_p0-w0: stopping experience collection (5550 times) [2024-03-29 13:59:18,303][00476] Signal inference workers to resume experience collection... (5550 times) [2024-03-29 13:59:18,303][00497] InferenceWorker_p0-w0: resuming experience collection (5550 times) [2024-03-29 13:59:18,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 42376.2). Total num frames: 272580608. Throughput: 0: 42551.1. Samples: 154718720. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:59:18,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 13:59:19,594][00497] Updated weights for policy 0, policy_version 16639 (0.0034) [2024-03-29 13:59:23,670][00497] Updated weights for policy 0, policy_version 16649 (0.0018) [2024-03-29 13:59:23,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 272777216. Throughput: 0: 42244.0. Samples: 154959720. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:59:23,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:59:27,527][00497] Updated weights for policy 0, policy_version 16659 (0.0024) [2024-03-29 13:59:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 272990208. Throughput: 0: 42386.8. Samples: 155219780. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:59:28,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:59:31,767][00497] Updated weights for policy 0, policy_version 16669 (0.0023) [2024-03-29 13:59:33,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 273203200. Throughput: 0: 42385.7. Samples: 155354720. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:59:33,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:59:34,834][00497] Updated weights for policy 0, policy_version 16679 (0.0021) [2024-03-29 13:59:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 273416192. Throughput: 0: 42323.0. Samples: 155595700. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:59:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:59:38,920][00497] Updated weights for policy 0, policy_version 16689 (0.0030) [2024-03-29 13:59:43,090][00497] Updated weights for policy 0, policy_version 16699 (0.0041) [2024-03-29 13:59:43,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 273629184. Throughput: 0: 42275.1. Samples: 155851760. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:59:43,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:59:47,213][00497] Updated weights for policy 0, policy_version 16709 (0.0024) [2024-03-29 13:59:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 273825792. Throughput: 0: 42185.8. Samples: 155986220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:59:48,840][00126] Avg episode reward: [(0, '0.316')] [2024-03-29 13:59:50,540][00497] Updated weights for policy 0, policy_version 16719 (0.0026) [2024-03-29 13:59:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 274055168. Throughput: 0: 42395.2. Samples: 156230560. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 13:59:53,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:59:54,595][00497] Updated weights for policy 0, policy_version 16729 (0.0028) [2024-03-29 13:59:55,051][00476] Signal inference workers to stop experience collection... (5600 times) [2024-03-29 13:59:55,090][00497] InferenceWorker_p0-w0: stopping experience collection (5600 times) [2024-03-29 13:59:55,277][00476] Signal inference workers to resume experience collection... (5600 times) [2024-03-29 13:59:55,277][00497] InferenceWorker_p0-w0: resuming experience collection (5600 times) [2024-03-29 13:59:58,520][00497] Updated weights for policy 0, policy_version 16739 (0.0020) [2024-03-29 13:59:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 274268160. Throughput: 0: 42346.2. Samples: 156487780. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 13:59:58,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:00:02,767][00497] Updated weights for policy 0, policy_version 16749 (0.0021) [2024-03-29 14:00:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42376.3). Total num frames: 274448384. Throughput: 0: 41944.1. Samples: 156606200. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 14:00:03,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 14:00:06,148][00497] Updated weights for policy 0, policy_version 16759 (0.0018) [2024-03-29 14:00:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 274694144. Throughput: 0: 42283.6. Samples: 156862480. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 14:00:08,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:00:10,221][00497] Updated weights for policy 0, policy_version 16769 (0.0019) [2024-03-29 14:00:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 274890752. Throughput: 0: 42411.1. Samples: 157128280. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 14:00:13,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:00:14,063][00497] Updated weights for policy 0, policy_version 16779 (0.0024) [2024-03-29 14:00:18,230][00497] Updated weights for policy 0, policy_version 16789 (0.0022) [2024-03-29 14:00:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42431.8). Total num frames: 275087360. Throughput: 0: 42243.1. Samples: 157255660. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:00:18,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:00:21,362][00497] Updated weights for policy 0, policy_version 16799 (0.0036) [2024-03-29 14:00:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42871.6, 300 sec: 42487.3). Total num frames: 275349504. Throughput: 0: 42469.9. Samples: 157506840. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:00:23,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:00:25,520][00497] Updated weights for policy 0, policy_version 16809 (0.0024) [2024-03-29 14:00:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.2, 300 sec: 42431.8). Total num frames: 275529728. Throughput: 0: 42536.3. Samples: 157765900. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 14:00:28,841][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 14:00:29,331][00497] Updated weights for policy 0, policy_version 16819 (0.0022) [2024-03-29 14:00:30,127][00476] Signal inference workers to stop experience collection... (5650 times) [2024-03-29 14:00:30,183][00497] InferenceWorker_p0-w0: stopping experience collection (5650 times) [2024-03-29 14:00:30,221][00476] Signal inference workers to resume experience collection... (5650 times) [2024-03-29 14:00:30,223][00497] InferenceWorker_p0-w0: resuming experience collection (5650 times) [2024-03-29 14:00:33,620][00497] Updated weights for policy 0, policy_version 16829 (0.0017) [2024-03-29 14:00:33,839][00126] Fps is (10 sec: 37682.7, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 275726336. Throughput: 0: 42370.1. Samples: 157892880. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 14:00:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:00:36,815][00497] Updated weights for policy 0, policy_version 16839 (0.0022) [2024-03-29 14:00:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 275955712. Throughput: 0: 42262.5. Samples: 158132380. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 14:00:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 14:00:40,942][00497] Updated weights for policy 0, policy_version 16849 (0.0034) [2024-03-29 14:00:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 276152320. Throughput: 0: 42456.9. Samples: 158398340. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 14:00:43,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:00:45,022][00497] Updated weights for policy 0, policy_version 16859 (0.0021) [2024-03-29 14:00:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 276365312. Throughput: 0: 42380.5. Samples: 158513320. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 14:00:48,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:00:49,065][00497] Updated weights for policy 0, policy_version 16869 (0.0018) [2024-03-29 14:00:52,473][00497] Updated weights for policy 0, policy_version 16879 (0.0018) [2024-03-29 14:00:53,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 42376.2). Total num frames: 276594688. Throughput: 0: 42429.8. Samples: 158771820. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 14:00:53,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 14:00:56,440][00497] Updated weights for policy 0, policy_version 16889 (0.0021) [2024-03-29 14:00:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 276774912. Throughput: 0: 42225.3. Samples: 159028420. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 14:00:58,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 14:01:00,399][00497] Updated weights for policy 0, policy_version 16899 (0.0030) [2024-03-29 14:01:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42598.4, 300 sec: 42376.3). Total num frames: 277004288. Throughput: 0: 42148.9. Samples: 159152360. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 14:01:03,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 14:01:04,105][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016908_277020672.pth... [2024-03-29 14:01:04,451][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016287_266846208.pth [2024-03-29 14:01:04,726][00497] Updated weights for policy 0, policy_version 16909 (0.0031) [2024-03-29 14:01:08,047][00497] Updated weights for policy 0, policy_version 16919 (0.0023) [2024-03-29 14:01:08,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 277233664. Throughput: 0: 42328.3. Samples: 159411620. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 14:01:08,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:01:12,291][00497] Updated weights for policy 0, policy_version 16929 (0.0017) [2024-03-29 14:01:12,470][00476] Signal inference workers to stop experience collection... (5700 times) [2024-03-29 14:01:12,495][00497] InferenceWorker_p0-w0: stopping experience collection (5700 times) [2024-03-29 14:01:12,690][00476] Signal inference workers to resume experience collection... (5700 times) [2024-03-29 14:01:12,691][00497] InferenceWorker_p0-w0: resuming experience collection (5700 times) [2024-03-29 14:01:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 277413888. Throughput: 0: 41967.2. Samples: 159654420. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 14:01:13,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:01:16,024][00497] Updated weights for policy 0, policy_version 16939 (0.0031) [2024-03-29 14:01:18,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 277643264. Throughput: 0: 42082.8. Samples: 159786600. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:01:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:01:20,150][00497] Updated weights for policy 0, policy_version 16949 (0.0025) [2024-03-29 14:01:23,582][00497] Updated weights for policy 0, policy_version 16959 (0.0028) [2024-03-29 14:01:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.1, 300 sec: 42320.7). Total num frames: 277856256. Throughput: 0: 42524.9. Samples: 160046000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:01:23,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 14:01:27,788][00497] Updated weights for policy 0, policy_version 16969 (0.0019) [2024-03-29 14:01:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 278069248. Throughput: 0: 42175.6. Samples: 160296240. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 14:01:28,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:01:31,527][00497] Updated weights for policy 0, policy_version 16979 (0.0018) [2024-03-29 14:01:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 278265856. Throughput: 0: 42367.5. Samples: 160419860. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 14:01:33,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:01:35,645][00497] Updated weights for policy 0, policy_version 16989 (0.0023) [2024-03-29 14:01:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 278495232. Throughput: 0: 42425.5. Samples: 160680960. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 14:01:38,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:01:38,983][00497] Updated weights for policy 0, policy_version 16999 (0.0024) [2024-03-29 14:01:43,308][00497] Updated weights for policy 0, policy_version 17009 (0.0023) [2024-03-29 14:01:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 278691840. Throughput: 0: 42114.1. Samples: 160923560. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:01:43,841][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 14:01:47,137][00497] Updated weights for policy 0, policy_version 17019 (0.0024) [2024-03-29 14:01:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 278904832. Throughput: 0: 42320.0. Samples: 161056760. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:01:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:01:50,943][00476] Signal inference workers to stop experience collection... (5750 times) [2024-03-29 14:01:50,983][00497] InferenceWorker_p0-w0: stopping experience collection (5750 times) [2024-03-29 14:01:51,024][00476] Signal inference workers to resume experience collection... (5750 times) [2024-03-29 14:01:51,024][00497] InferenceWorker_p0-w0: resuming experience collection (5750 times) [2024-03-29 14:01:51,030][00497] Updated weights for policy 0, policy_version 17029 (0.0031) [2024-03-29 14:01:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 279134208. Throughput: 0: 42288.9. Samples: 161314620. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 14:01:53,841][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:01:54,259][00497] Updated weights for policy 0, policy_version 17039 (0.0028) [2024-03-29 14:01:58,719][00497] Updated weights for policy 0, policy_version 17049 (0.0018) [2024-03-29 14:01:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 279330816. Throughput: 0: 42460.8. Samples: 161565160. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 14:01:58,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:02:02,448][00497] Updated weights for policy 0, policy_version 17059 (0.0018) [2024-03-29 14:02:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 279543808. Throughput: 0: 42420.8. Samples: 161695540. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 14:02:03,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:02:06,348][00497] Updated weights for policy 0, policy_version 17069 (0.0020) [2024-03-29 14:02:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 279773184. Throughput: 0: 42416.5. Samples: 161954740. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:02:08,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 14:02:09,684][00497] Updated weights for policy 0, policy_version 17079 (0.0021) [2024-03-29 14:02:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 279969792. Throughput: 0: 42434.1. Samples: 162205780. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:02:13,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 14:02:14,123][00497] Updated weights for policy 0, policy_version 17089 (0.0025) [2024-03-29 14:02:17,822][00497] Updated weights for policy 0, policy_version 17099 (0.0034) [2024-03-29 14:02:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 280182784. Throughput: 0: 42613.4. Samples: 162337460. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:02:18,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:02:21,907][00497] Updated weights for policy 0, policy_version 17109 (0.0022) [2024-03-29 14:02:23,839][00126] Fps is (10 sec: 40961.0, 60 sec: 42052.4, 300 sec: 42376.3). Total num frames: 280379392. Throughput: 0: 42466.8. Samples: 162591960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:02:23,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:02:25,238][00497] Updated weights for policy 0, policy_version 17119 (0.0021) [2024-03-29 14:02:26,218][00476] Signal inference workers to stop experience collection... (5800 times) [2024-03-29 14:02:26,271][00497] InferenceWorker_p0-w0: stopping experience collection (5800 times) [2024-03-29 14:02:26,305][00476] Signal inference workers to resume experience collection... (5800 times) [2024-03-29 14:02:26,307][00497] InferenceWorker_p0-w0: resuming experience collection (5800 times) [2024-03-29 14:02:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 280608768. Throughput: 0: 42505.0. Samples: 162836280. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:02:28,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:02:29,499][00497] Updated weights for policy 0, policy_version 17129 (0.0024) [2024-03-29 14:02:33,223][00497] Updated weights for policy 0, policy_version 17139 (0.0028) [2024-03-29 14:02:33,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 280821760. Throughput: 0: 42618.2. Samples: 162974580. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 14:02:33,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 14:02:37,315][00497] Updated weights for policy 0, policy_version 17149 (0.0019) [2024-03-29 14:02:38,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 42431.8). Total num frames: 281034752. Throughput: 0: 42506.2. Samples: 163227400. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 14:02:38,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 14:02:40,539][00497] Updated weights for policy 0, policy_version 17159 (0.0026) [2024-03-29 14:02:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 281247744. Throughput: 0: 42175.6. Samples: 163463060. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 14:02:43,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:02:45,178][00497] Updated weights for policy 0, policy_version 17169 (0.0022) [2024-03-29 14:02:48,815][00497] Updated weights for policy 0, policy_version 17179 (0.0023) [2024-03-29 14:02:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 281460736. Throughput: 0: 42280.9. Samples: 163598180. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 14:02:48,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:02:52,655][00497] Updated weights for policy 0, policy_version 17189 (0.0024) [2024-03-29 14:02:53,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 281673728. Throughput: 0: 42364.9. Samples: 163861160. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 14:02:53,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 14:02:55,941][00497] Updated weights for policy 0, policy_version 17199 (0.0025) [2024-03-29 14:02:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 281886720. Throughput: 0: 42146.7. Samples: 164102380. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:02:58,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:03:00,549][00497] Updated weights for policy 0, policy_version 17209 (0.0019) [2024-03-29 14:03:03,478][00476] Signal inference workers to stop experience collection... (5850 times) [2024-03-29 14:03:03,513][00497] InferenceWorker_p0-w0: stopping experience collection (5850 times) [2024-03-29 14:03:03,691][00476] Signal inference workers to resume experience collection... (5850 times) [2024-03-29 14:03:03,691][00497] InferenceWorker_p0-w0: resuming experience collection (5850 times) [2024-03-29 14:03:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 282083328. Throughput: 0: 42234.2. Samples: 164238000. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:03:03,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:03:03,982][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017218_282099712.pth... [2024-03-29 14:03:04,314][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016599_271958016.pth [2024-03-29 14:03:04,592][00497] Updated weights for policy 0, policy_version 17219 (0.0022) [2024-03-29 14:03:08,317][00497] Updated weights for policy 0, policy_version 17229 (0.0019) [2024-03-29 14:03:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 282296320. Throughput: 0: 42153.6. Samples: 164488880. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 14:03:08,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 14:03:11,753][00497] Updated weights for policy 0, policy_version 17239 (0.0032) [2024-03-29 14:03:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 282509312. Throughput: 0: 41980.4. Samples: 164725400. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 14:03:13,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 14:03:16,122][00497] Updated weights for policy 0, policy_version 17249 (0.0025) [2024-03-29 14:03:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 282705920. Throughput: 0: 41957.0. Samples: 164862640. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 14:03:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:03:19,949][00497] Updated weights for policy 0, policy_version 17259 (0.0021) [2024-03-29 14:03:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 282935296. Throughput: 0: 42332.6. Samples: 165132360. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:03:23,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:03:23,857][00497] Updated weights for policy 0, policy_version 17269 (0.0016) [2024-03-29 14:03:27,113][00497] Updated weights for policy 0, policy_version 17279 (0.0020) [2024-03-29 14:03:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 283148288. Throughput: 0: 42205.5. Samples: 165362300. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:03:28,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 14:03:31,603][00497] Updated weights for policy 0, policy_version 17289 (0.0018) [2024-03-29 14:03:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 283344896. Throughput: 0: 42153.5. Samples: 165495080. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:03:33,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:03:35,683][00497] Updated weights for policy 0, policy_version 17299 (0.0027) [2024-03-29 14:03:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 283574272. Throughput: 0: 42079.1. Samples: 165754720. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:03:38,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:03:39,356][00497] Updated weights for policy 0, policy_version 17309 (0.0023) [2024-03-29 14:03:41,999][00476] Signal inference workers to stop experience collection... (5900 times) [2024-03-29 14:03:42,019][00497] InferenceWorker_p0-w0: stopping experience collection (5900 times) [2024-03-29 14:03:42,209][00476] Signal inference workers to resume experience collection... (5900 times) [2024-03-29 14:03:42,210][00497] InferenceWorker_p0-w0: resuming experience collection (5900 times) [2024-03-29 14:03:42,712][00497] Updated weights for policy 0, policy_version 17319 (0.0024) [2024-03-29 14:03:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 283770880. Throughput: 0: 41713.9. Samples: 165979500. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:03:43,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:03:47,242][00497] Updated weights for policy 0, policy_version 17329 (0.0029) [2024-03-29 14:03:48,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 283983872. Throughput: 0: 41912.8. Samples: 166124080. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:03:48,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:03:51,330][00497] Updated weights for policy 0, policy_version 17339 (0.0023) [2024-03-29 14:03:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 284180480. Throughput: 0: 41985.5. Samples: 166378220. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:03:53,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:03:54,871][00497] Updated weights for policy 0, policy_version 17349 (0.0027) [2024-03-29 14:03:58,253][00497] Updated weights for policy 0, policy_version 17359 (0.0026) [2024-03-29 14:03:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 284426240. Throughput: 0: 42215.4. Samples: 166625100. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:03:58,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 14:04:02,843][00497] Updated weights for policy 0, policy_version 17369 (0.0021) [2024-03-29 14:04:03,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42325.2, 300 sec: 42320.7). Total num frames: 284622848. Throughput: 0: 42150.1. Samples: 166759400. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:04:03,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:04:06,731][00497] Updated weights for policy 0, policy_version 17379 (0.0024) [2024-03-29 14:04:08,839][00126] Fps is (10 sec: 39322.3, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 284819456. Throughput: 0: 41833.3. Samples: 167014860. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:04:08,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 14:04:09,099][00476] Saving new best policy, reward=0.545! [2024-03-29 14:04:10,460][00497] Updated weights for policy 0, policy_version 17389 (0.0019) [2024-03-29 14:04:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 285048832. Throughput: 0: 42168.0. Samples: 167259860. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:13,841][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 14:04:14,074][00497] Updated weights for policy 0, policy_version 17399 (0.0032) [2024-03-29 14:04:18,470][00497] Updated weights for policy 0, policy_version 17409 (0.0021) [2024-03-29 14:04:18,512][00476] Signal inference workers to stop experience collection... (5950 times) [2024-03-29 14:04:18,533][00497] InferenceWorker_p0-w0: stopping experience collection (5950 times) [2024-03-29 14:04:18,717][00476] Signal inference workers to resume experience collection... (5950 times) [2024-03-29 14:04:18,717][00497] InferenceWorker_p0-w0: resuming experience collection (5950 times) [2024-03-29 14:04:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 285245440. Throughput: 0: 42137.6. Samples: 167391280. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:18,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:04:22,361][00497] Updated weights for policy 0, policy_version 17419 (0.0024) [2024-03-29 14:04:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 285442048. Throughput: 0: 42157.3. Samples: 167651800. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:04:23,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:04:26,242][00497] Updated weights for policy 0, policy_version 17429 (0.0021) [2024-03-29 14:04:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42265.2). Total num frames: 285671424. Throughput: 0: 42588.8. Samples: 167896000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:04:28,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:04:29,562][00497] Updated weights for policy 0, policy_version 17439 (0.0022) [2024-03-29 14:04:33,775][00497] Updated weights for policy 0, policy_version 17449 (0.0020) [2024-03-29 14:04:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 285884416. Throughput: 0: 42131.2. Samples: 168019980. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:04:33,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:04:37,893][00497] Updated weights for policy 0, policy_version 17459 (0.0019) [2024-03-29 14:04:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 286081024. Throughput: 0: 42475.8. Samples: 168289640. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:38,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:04:41,451][00497] Updated weights for policy 0, policy_version 17469 (0.0023) [2024-03-29 14:04:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 286310400. Throughput: 0: 42449.5. Samples: 168535320. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:43,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 14:04:45,156][00497] Updated weights for policy 0, policy_version 17479 (0.0020) [2024-03-29 14:04:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 286507008. Throughput: 0: 41924.5. Samples: 168646000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:04:48,842][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:04:49,591][00497] Updated weights for policy 0, policy_version 17489 (0.0029) [2024-03-29 14:04:52,881][00476] Signal inference workers to stop experience collection... (6000 times) [2024-03-29 14:04:52,955][00497] InferenceWorker_p0-w0: stopping experience collection (6000 times) [2024-03-29 14:04:52,959][00476] Signal inference workers to resume experience collection... (6000 times) [2024-03-29 14:04:52,981][00497] InferenceWorker_p0-w0: resuming experience collection (6000 times) [2024-03-29 14:04:53,558][00497] Updated weights for policy 0, policy_version 17499 (0.0019) [2024-03-29 14:04:53,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42052.1, 300 sec: 42154.1). Total num frames: 286703616. Throughput: 0: 42257.6. Samples: 168916460. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:04:53,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 14:04:57,035][00497] Updated weights for policy 0, policy_version 17509 (0.0024) [2024-03-29 14:04:58,839][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.3, 300 sec: 42265.2). Total num frames: 286916608. Throughput: 0: 42090.8. Samples: 169153940. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:04:58,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 14:05:00,941][00497] Updated weights for policy 0, policy_version 17519 (0.0025) [2024-03-29 14:05:03,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 287129600. Throughput: 0: 41666.3. Samples: 169266260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 14:05:03,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:05:04,221][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017527_287162368.pth... [2024-03-29 14:05:04,550][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016908_277020672.pth [2024-03-29 14:05:05,281][00497] Updated weights for policy 0, policy_version 17529 (0.0023) [2024-03-29 14:05:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 287326208. Throughput: 0: 41898.2. Samples: 169537220. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 14:05:08,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:05:09,194][00497] Updated weights for policy 0, policy_version 17539 (0.0020) [2024-03-29 14:05:12,844][00497] Updated weights for policy 0, policy_version 17549 (0.0029) [2024-03-29 14:05:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 287555584. Throughput: 0: 41837.9. Samples: 169778700. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:05:13,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:05:16,587][00497] Updated weights for policy 0, policy_version 17559 (0.0023) [2024-03-29 14:05:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 287752192. Throughput: 0: 41904.5. Samples: 169905680. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:05:18,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:05:20,803][00497] Updated weights for policy 0, policy_version 17569 (0.0033) [2024-03-29 14:05:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 42098.6). Total num frames: 287948800. Throughput: 0: 41688.9. Samples: 170165640. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:05:23,841][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:05:24,741][00497] Updated weights for policy 0, policy_version 17579 (0.0023) [2024-03-29 14:05:25,524][00476] Signal inference workers to stop experience collection... (6050 times) [2024-03-29 14:05:25,525][00476] Signal inference workers to resume experience collection... (6050 times) [2024-03-29 14:05:25,568][00497] InferenceWorker_p0-w0: stopping experience collection (6050 times) [2024-03-29 14:05:25,568][00497] InferenceWorker_p0-w0: resuming experience collection (6050 times) [2024-03-29 14:05:28,497][00497] Updated weights for policy 0, policy_version 17589 (0.0023) [2024-03-29 14:05:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 288194560. Throughput: 0: 41806.6. Samples: 170416620. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 14:05:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:05:32,257][00497] Updated weights for policy 0, policy_version 17599 (0.0018) [2024-03-29 14:05:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 288391168. Throughput: 0: 42063.2. Samples: 170538840. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 14:05:33,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:05:35,966][00497] Updated weights for policy 0, policy_version 17609 (0.0028) [2024-03-29 14:05:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 288587776. Throughput: 0: 41878.8. Samples: 170801000. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 14:05:38,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:05:40,034][00497] Updated weights for policy 0, policy_version 17619 (0.0023) [2024-03-29 14:05:43,719][00497] Updated weights for policy 0, policy_version 17629 (0.0024) [2024-03-29 14:05:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 288833536. Throughput: 0: 42206.0. Samples: 171053220. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 14:05:43,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:05:47,586][00497] Updated weights for policy 0, policy_version 17639 (0.0034) [2024-03-29 14:05:48,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 289030144. Throughput: 0: 42484.8. Samples: 171178080. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 14:05:48,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:05:51,712][00497] Updated weights for policy 0, policy_version 17649 (0.0018) [2024-03-29 14:05:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 289226752. Throughput: 0: 42165.7. Samples: 171434680. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:05:53,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:05:55,643][00497] Updated weights for policy 0, policy_version 17659 (0.0019) [2024-03-29 14:05:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 289456128. Throughput: 0: 42256.0. Samples: 171680220. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:05:58,841][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:05:59,283][00497] Updated weights for policy 0, policy_version 17669 (0.0023) [2024-03-29 14:06:00,806][00476] Signal inference workers to stop experience collection... (6100 times) [2024-03-29 14:06:00,809][00476] Signal inference workers to resume experience collection... (6100 times) [2024-03-29 14:06:00,857][00497] InferenceWorker_p0-w0: stopping experience collection (6100 times) [2024-03-29 14:06:00,857][00497] InferenceWorker_p0-w0: resuming experience collection (6100 times) [2024-03-29 14:06:03,276][00497] Updated weights for policy 0, policy_version 17679 (0.0020) [2024-03-29 14:06:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 289669120. Throughput: 0: 42325.3. Samples: 171810320. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:06:03,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:06:07,354][00497] Updated weights for policy 0, policy_version 17689 (0.0018) [2024-03-29 14:06:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 289865728. Throughput: 0: 42202.7. Samples: 172064760. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:06:08,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:06:11,430][00497] Updated weights for policy 0, policy_version 17699 (0.0021) [2024-03-29 14:06:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 290078720. Throughput: 0: 41976.0. Samples: 172305540. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:06:13,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 14:06:15,111][00497] Updated weights for policy 0, policy_version 17709 (0.0025) [2024-03-29 14:06:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 290291712. Throughput: 0: 42218.3. Samples: 172438660. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 14:06:18,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 14:06:18,951][00497] Updated weights for policy 0, policy_version 17719 (0.0033) [2024-03-29 14:06:22,849][00497] Updated weights for policy 0, policy_version 17729 (0.0018) [2024-03-29 14:06:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 290504704. Throughput: 0: 41950.6. Samples: 172688780. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 14:06:23,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:06:27,092][00497] Updated weights for policy 0, policy_version 17739 (0.0020) [2024-03-29 14:06:28,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 290701312. Throughput: 0: 41932.9. Samples: 172940200. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 14:06:28,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:06:30,715][00497] Updated weights for policy 0, policy_version 17749 (0.0030) [2024-03-29 14:06:32,539][00476] Signal inference workers to stop experience collection... (6150 times) [2024-03-29 14:06:32,546][00476] Signal inference workers to resume experience collection... (6150 times) [2024-03-29 14:06:32,586][00497] InferenceWorker_p0-w0: stopping experience collection (6150 times) [2024-03-29 14:06:32,586][00497] InferenceWorker_p0-w0: resuming experience collection (6150 times) [2024-03-29 14:06:33,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 290914304. Throughput: 0: 41776.8. Samples: 173058040. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:06:33,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:06:34,666][00497] Updated weights for policy 0, policy_version 17759 (0.0026) [2024-03-29 14:06:38,750][00497] Updated weights for policy 0, policy_version 17769 (0.0021) [2024-03-29 14:06:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 291127296. Throughput: 0: 41676.9. Samples: 173310140. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:06:38,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 14:06:42,833][00497] Updated weights for policy 0, policy_version 17779 (0.0024) [2024-03-29 14:06:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 42098.5). Total num frames: 291323904. Throughput: 0: 41745.7. Samples: 173558780. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 14:06:43,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:06:46,404][00497] Updated weights for policy 0, policy_version 17789 (0.0028) [2024-03-29 14:06:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 291536896. Throughput: 0: 41484.9. Samples: 173677140. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 14:06:48,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:06:50,478][00497] Updated weights for policy 0, policy_version 17799 (0.0020) [2024-03-29 14:06:53,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 291733504. Throughput: 0: 41561.4. Samples: 173935020. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 14:06:53,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:06:54,762][00497] Updated weights for policy 0, policy_version 17809 (0.0026) [2024-03-29 14:06:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 291930112. Throughput: 0: 41812.0. Samples: 174187080. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:06:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:06:58,969][00497] Updated weights for policy 0, policy_version 17819 (0.0028) [2024-03-29 14:07:01,970][00476] Signal inference workers to stop experience collection... (6200 times) [2024-03-29 14:07:02,009][00497] InferenceWorker_p0-w0: stopping experience collection (6200 times) [2024-03-29 14:07:02,167][00476] Signal inference workers to resume experience collection... (6200 times) [2024-03-29 14:07:02,168][00497] InferenceWorker_p0-w0: resuming experience collection (6200 times) [2024-03-29 14:07:02,470][00497] Updated weights for policy 0, policy_version 17829 (0.0026) [2024-03-29 14:07:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 292143104. Throughput: 0: 41377.3. Samples: 174300640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:07:03,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:07:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017832_292159488.pth... [2024-03-29 14:07:04,221][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017218_282099712.pth [2024-03-29 14:07:06,691][00497] Updated weights for policy 0, policy_version 17839 (0.0025) [2024-03-29 14:07:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 292339712. Throughput: 0: 40887.5. Samples: 174528720. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 14:07:08,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:07:10,854][00497] Updated weights for policy 0, policy_version 17849 (0.0019) [2024-03-29 14:07:13,839][00126] Fps is (10 sec: 37682.8, 60 sec: 40686.8, 300 sec: 41820.8). Total num frames: 292519936. Throughput: 0: 41201.7. Samples: 174794280. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 14:07:13,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 14:07:15,074][00497] Updated weights for policy 0, policy_version 17859 (0.0027) [2024-03-29 14:07:18,509][00497] Updated weights for policy 0, policy_version 17869 (0.0029) [2024-03-29 14:07:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.0, 300 sec: 41987.5). Total num frames: 292765696. Throughput: 0: 41315.2. Samples: 174917220. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 14:07:18,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 14:07:22,345][00497] Updated weights for policy 0, policy_version 17879 (0.0024) [2024-03-29 14:07:23,839][00126] Fps is (10 sec: 44237.5, 60 sec: 40960.0, 300 sec: 41876.4). Total num frames: 292962304. Throughput: 0: 41248.1. Samples: 175166300. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:07:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:07:26,355][00497] Updated weights for policy 0, policy_version 17889 (0.0018) [2024-03-29 14:07:28,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 293158912. Throughput: 0: 41299.6. Samples: 175417260. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:07:28,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:07:30,632][00497] Updated weights for policy 0, policy_version 17899 (0.0025) [2024-03-29 14:07:32,810][00476] Signal inference workers to stop experience collection... (6250 times) [2024-03-29 14:07:32,811][00476] Signal inference workers to resume experience collection... (6250 times) [2024-03-29 14:07:32,861][00497] InferenceWorker_p0-w0: stopping experience collection (6250 times) [2024-03-29 14:07:32,861][00497] InferenceWorker_p0-w0: resuming experience collection (6250 times) [2024-03-29 14:07:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 293404672. Throughput: 0: 41431.5. Samples: 175541560. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:07:33,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 14:07:33,983][00497] Updated weights for policy 0, policy_version 17909 (0.0022) [2024-03-29 14:07:37,977][00497] Updated weights for policy 0, policy_version 17919 (0.0018) [2024-03-29 14:07:38,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 293617664. Throughput: 0: 41300.0. Samples: 175793520. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:07:38,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 14:07:41,876][00497] Updated weights for policy 0, policy_version 17929 (0.0030) [2024-03-29 14:07:43,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 293814272. Throughput: 0: 41377.7. Samples: 176049080. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:07:43,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:07:46,254][00497] Updated weights for policy 0, policy_version 17939 (0.0020) [2024-03-29 14:07:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 294027264. Throughput: 0: 41598.2. Samples: 176172560. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 14:07:48,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:07:49,558][00497] Updated weights for policy 0, policy_version 17949 (0.0028) [2024-03-29 14:07:53,721][00497] Updated weights for policy 0, policy_version 17959 (0.0021) [2024-03-29 14:07:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 294240256. Throughput: 0: 41954.7. Samples: 176416680. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 14:07:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:07:57,756][00497] Updated weights for policy 0, policy_version 17969 (0.0019) [2024-03-29 14:07:58,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 294420480. Throughput: 0: 41516.0. Samples: 176662500. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 14:07:58,841][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 14:08:02,079][00497] Updated weights for policy 0, policy_version 17979 (0.0021) [2024-03-29 14:08:03,226][00476] Signal inference workers to stop experience collection... (6300 times) [2024-03-29 14:08:03,245][00497] InferenceWorker_p0-w0: stopping experience collection (6300 times) [2024-03-29 14:08:03,432][00476] Signal inference workers to resume experience collection... (6300 times) [2024-03-29 14:08:03,433][00497] InferenceWorker_p0-w0: resuming experience collection (6300 times) [2024-03-29 14:08:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 294633472. Throughput: 0: 41917.2. Samples: 176803500. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 14:08:03,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 14:08:05,483][00497] Updated weights for policy 0, policy_version 17989 (0.0035) [2024-03-29 14:08:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 294846464. Throughput: 0: 41521.3. Samples: 177034760. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 14:08:08,841][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 14:08:09,701][00497] Updated weights for policy 0, policy_version 17999 (0.0025) [2024-03-29 14:08:13,388][00497] Updated weights for policy 0, policy_version 18009 (0.0021) [2024-03-29 14:08:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 295075840. Throughput: 0: 41629.7. Samples: 177290600. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 14:08:13,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:08:17,989][00497] Updated weights for policy 0, policy_version 18019 (0.0023) [2024-03-29 14:08:18,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 295239680. Throughput: 0: 41887.0. Samples: 177426480. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 14:08:18,840][00126] Avg episode reward: [(0, '0.298')] [2024-03-29 14:08:21,412][00497] Updated weights for policy 0, policy_version 18029 (0.0027) [2024-03-29 14:08:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 295469056. Throughput: 0: 41262.6. Samples: 177650340. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 14:08:23,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 14:08:25,542][00497] Updated weights for policy 0, policy_version 18039 (0.0024) [2024-03-29 14:08:28,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 295682048. Throughput: 0: 41442.8. Samples: 177914000. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:08:28,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:08:29,232][00497] Updated weights for policy 0, policy_version 18049 (0.0026) [2024-03-29 14:08:32,862][00476] Signal inference workers to stop experience collection... (6350 times) [2024-03-29 14:08:32,905][00497] InferenceWorker_p0-w0: stopping experience collection (6350 times) [2024-03-29 14:08:33,077][00476] Signal inference workers to resume experience collection... (6350 times) [2024-03-29 14:08:33,078][00497] InferenceWorker_p0-w0: resuming experience collection (6350 times) [2024-03-29 14:08:33,794][00497] Updated weights for policy 0, policy_version 18059 (0.0018) [2024-03-29 14:08:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 295878656. Throughput: 0: 41491.5. Samples: 178039680. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:08:33,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:08:37,315][00497] Updated weights for policy 0, policy_version 18069 (0.0030) [2024-03-29 14:08:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 296108032. Throughput: 0: 41420.0. Samples: 178280580. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:08:38,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:08:41,216][00497] Updated weights for policy 0, policy_version 18079 (0.0024) [2024-03-29 14:08:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 296304640. Throughput: 0: 41559.6. Samples: 178532680. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:08:43,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 14:08:45,021][00497] Updated weights for policy 0, policy_version 18089 (0.0038) [2024-03-29 14:08:48,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.2, 300 sec: 41765.3). Total num frames: 296501248. Throughput: 0: 41238.0. Samples: 178659200. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:08:48,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:08:49,308][00497] Updated weights for policy 0, policy_version 18099 (0.0025) [2024-03-29 14:08:52,745][00497] Updated weights for policy 0, policy_version 18109 (0.0032) [2024-03-29 14:08:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 296730624. Throughput: 0: 41591.8. Samples: 178906400. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:08:53,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:08:56,732][00497] Updated weights for policy 0, policy_version 18119 (0.0022) [2024-03-29 14:08:58,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 296960000. Throughput: 0: 41763.7. Samples: 179169960. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:08:58,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 14:09:00,422][00497] Updated weights for policy 0, policy_version 18129 (0.0032) [2024-03-29 14:09:03,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 297123840. Throughput: 0: 41443.2. Samples: 179291420. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:09:03,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:09:04,177][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018137_297156608.pth... [2024-03-29 14:09:04,506][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017527_287162368.pth [2024-03-29 14:09:05,098][00497] Updated weights for policy 0, policy_version 18139 (0.0028) [2024-03-29 14:09:06,965][00476] Signal inference workers to stop experience collection... (6400 times) [2024-03-29 14:09:07,045][00497] InferenceWorker_p0-w0: stopping experience collection (6400 times) [2024-03-29 14:09:07,134][00476] Signal inference workers to resume experience collection... (6400 times) [2024-03-29 14:09:07,134][00497] InferenceWorker_p0-w0: resuming experience collection (6400 times) [2024-03-29 14:09:08,652][00497] Updated weights for policy 0, policy_version 18149 (0.0024) [2024-03-29 14:09:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 297353216. Throughput: 0: 41971.6. Samples: 179539060. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 14:09:08,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 14:09:12,689][00497] Updated weights for policy 0, policy_version 18159 (0.0017) [2024-03-29 14:09:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 297566208. Throughput: 0: 41629.3. Samples: 179787320. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 14:09:13,841][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:09:16,208][00497] Updated weights for policy 0, policy_version 18169 (0.0027) [2024-03-29 14:09:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 297746432. Throughput: 0: 41574.3. Samples: 179910520. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:09:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:09:20,915][00497] Updated weights for policy 0, policy_version 18179 (0.0029) [2024-03-29 14:09:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 297975808. Throughput: 0: 42178.6. Samples: 180178620. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:09:23,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:09:24,445][00497] Updated weights for policy 0, policy_version 18189 (0.0031) [2024-03-29 14:09:28,275][00497] Updated weights for policy 0, policy_version 18199 (0.0034) [2024-03-29 14:09:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 298188800. Throughput: 0: 41924.1. Samples: 180419260. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:09:28,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:09:31,968][00497] Updated weights for policy 0, policy_version 18209 (0.0028) [2024-03-29 14:09:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 298385408. Throughput: 0: 41803.9. Samples: 180540380. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:09:33,841][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:09:36,649][00497] Updated weights for policy 0, policy_version 18219 (0.0022) [2024-03-29 14:09:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 298598400. Throughput: 0: 42087.3. Samples: 180800320. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:09:38,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 14:09:40,216][00497] Updated weights for policy 0, policy_version 18229 (0.0035) [2024-03-29 14:09:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 298795008. Throughput: 0: 41255.9. Samples: 181026480. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 14:09:43,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:09:43,904][00476] Signal inference workers to stop experience collection... (6450 times) [2024-03-29 14:09:43,904][00476] Signal inference workers to resume experience collection... (6450 times) [2024-03-29 14:09:43,943][00497] InferenceWorker_p0-w0: stopping experience collection (6450 times) [2024-03-29 14:09:43,943][00497] InferenceWorker_p0-w0: resuming experience collection (6450 times) [2024-03-29 14:09:44,169][00497] Updated weights for policy 0, policy_version 18239 (0.0021) [2024-03-29 14:09:47,946][00497] Updated weights for policy 0, policy_version 18249 (0.0028) [2024-03-29 14:09:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 299008000. Throughput: 0: 41487.1. Samples: 181158340. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 14:09:48,841][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 14:09:52,544][00497] Updated weights for policy 0, policy_version 18259 (0.0027) [2024-03-29 14:09:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 299204608. Throughput: 0: 41726.6. Samples: 181416760. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 14:09:53,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:09:55,950][00497] Updated weights for policy 0, policy_version 18269 (0.0024) [2024-03-29 14:09:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 299433984. Throughput: 0: 41618.7. Samples: 181660160. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 14:09:58,840][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 14:09:59,669][00497] Updated weights for policy 0, policy_version 18279 (0.0018) [2024-03-29 14:10:03,433][00497] Updated weights for policy 0, policy_version 18289 (0.0021) [2024-03-29 14:10:03,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 299663360. Throughput: 0: 41875.6. Samples: 181794920. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 14:10:03,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:10:07,778][00497] Updated weights for policy 0, policy_version 18299 (0.0022) [2024-03-29 14:10:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 299843584. Throughput: 0: 41849.4. Samples: 182061840. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 14:10:08,841][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:10:11,293][00497] Updated weights for policy 0, policy_version 18309 (0.0017) [2024-03-29 14:10:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 300089344. Throughput: 0: 41684.8. Samples: 182295080. Policy #0 lag: (min: 2.0, avg: 19.9, max: 43.0) [2024-03-29 14:10:13,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:10:14,738][00476] Signal inference workers to stop experience collection... (6500 times) [2024-03-29 14:10:14,774][00497] InferenceWorker_p0-w0: stopping experience collection (6500 times) [2024-03-29 14:10:14,925][00476] Signal inference workers to resume experience collection... (6500 times) [2024-03-29 14:10:14,926][00497] InferenceWorker_p0-w0: resuming experience collection (6500 times) [2024-03-29 14:10:15,182][00497] Updated weights for policy 0, policy_version 18319 (0.0029) [2024-03-29 14:10:18,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 300285952. Throughput: 0: 41851.1. Samples: 182423680. Policy #0 lag: (min: 2.0, avg: 19.9, max: 43.0) [2024-03-29 14:10:18,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:10:18,928][00497] Updated weights for policy 0, policy_version 18329 (0.0024) [2024-03-29 14:10:23,393][00497] Updated weights for policy 0, policy_version 18339 (0.0024) [2024-03-29 14:10:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 300482560. Throughput: 0: 42107.9. Samples: 182695180. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 14:10:23,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:10:26,940][00497] Updated weights for policy 0, policy_version 18349 (0.0019) [2024-03-29 14:10:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 300711936. Throughput: 0: 42110.7. Samples: 182921460. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 14:10:28,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:10:30,806][00497] Updated weights for policy 0, policy_version 18359 (0.0025) [2024-03-29 14:10:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 300908544. Throughput: 0: 42177.8. Samples: 183056340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 14:10:33,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:10:34,461][00497] Updated weights for policy 0, policy_version 18369 (0.0023) [2024-03-29 14:10:38,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.0, 300 sec: 41543.2). Total num frames: 301088768. Throughput: 0: 42201.8. Samples: 183315840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:10:38,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 14:10:39,245][00497] Updated weights for policy 0, policy_version 18379 (0.0018) [2024-03-29 14:10:42,557][00497] Updated weights for policy 0, policy_version 18389 (0.0027) [2024-03-29 14:10:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.5, 300 sec: 41765.3). Total num frames: 301350912. Throughput: 0: 42122.6. Samples: 183555680. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:10:43,840][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 14:10:46,323][00497] Updated weights for policy 0, policy_version 18399 (0.0023) [2024-03-29 14:10:48,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 301547520. Throughput: 0: 42105.3. Samples: 183689660. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:10:48,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 14:10:49,351][00476] Signal inference workers to stop experience collection... (6550 times) [2024-03-29 14:10:49,351][00476] Signal inference workers to resume experience collection... (6550 times) [2024-03-29 14:10:49,397][00497] InferenceWorker_p0-w0: stopping experience collection (6550 times) [2024-03-29 14:10:49,397][00497] InferenceWorker_p0-w0: resuming experience collection (6550 times) [2024-03-29 14:10:50,092][00497] Updated weights for policy 0, policy_version 18409 (0.0020) [2024-03-29 14:10:53,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 301727744. Throughput: 0: 41868.5. Samples: 183945920. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:10:53,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:10:54,726][00497] Updated weights for policy 0, policy_version 18419 (0.0018) [2024-03-29 14:10:57,862][00497] Updated weights for policy 0, policy_version 18429 (0.0024) [2024-03-29 14:10:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 301989888. Throughput: 0: 42274.7. Samples: 184197440. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:10:58,840][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 14:11:01,716][00497] Updated weights for policy 0, policy_version 18439 (0.0022) [2024-03-29 14:11:03,839][00126] Fps is (10 sec: 47513.3, 60 sec: 42325.2, 300 sec: 41820.8). Total num frames: 302202880. Throughput: 0: 42358.6. Samples: 184329820. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 14:11:03,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:11:04,000][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018446_302219264.pth... [2024-03-29 14:11:04,293][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017832_292159488.pth [2024-03-29 14:11:05,490][00497] Updated weights for policy 0, policy_version 18449 (0.0017) [2024-03-29 14:11:08,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 302366720. Throughput: 0: 42159.1. Samples: 184592340. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 14:11:08,841][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:11:09,922][00497] Updated weights for policy 0, policy_version 18459 (0.0018) [2024-03-29 14:11:13,301][00497] Updated weights for policy 0, policy_version 18469 (0.0025) [2024-03-29 14:11:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 302612480. Throughput: 0: 42606.3. Samples: 184838740. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 14:11:13,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 14:11:17,205][00497] Updated weights for policy 0, policy_version 18479 (0.0018) [2024-03-29 14:11:18,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42598.4, 300 sec: 41820.8). Total num frames: 302841856. Throughput: 0: 42443.0. Samples: 184966280. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 14:11:18,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:11:20,802][00497] Updated weights for policy 0, policy_version 18489 (0.0019) [2024-03-29 14:11:23,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 303005696. Throughput: 0: 42529.1. Samples: 185229640. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 14:11:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:11:25,409][00497] Updated weights for policy 0, policy_version 18499 (0.0026) [2024-03-29 14:11:25,793][00476] Signal inference workers to stop experience collection... (6600 times) [2024-03-29 14:11:25,816][00497] InferenceWorker_p0-w0: stopping experience collection (6600 times) [2024-03-29 14:11:25,988][00476] Signal inference workers to resume experience collection... (6600 times) [2024-03-29 14:11:25,988][00497] InferenceWorker_p0-w0: resuming experience collection (6600 times) [2024-03-29 14:11:28,791][00497] Updated weights for policy 0, policy_version 18509 (0.0019) [2024-03-29 14:11:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 303251456. Throughput: 0: 42764.4. Samples: 185480080. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:11:28,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:11:32,629][00497] Updated weights for policy 0, policy_version 18519 (0.0029) [2024-03-29 14:11:33,839][00126] Fps is (10 sec: 45874.4, 60 sec: 42598.3, 300 sec: 41820.9). Total num frames: 303464448. Throughput: 0: 42410.6. Samples: 185598140. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:11:33,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:11:36,389][00497] Updated weights for policy 0, policy_version 18529 (0.0032) [2024-03-29 14:11:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42871.5, 300 sec: 41820.8). Total num frames: 303661056. Throughput: 0: 42356.9. Samples: 185851980. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:11:38,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:11:41,069][00497] Updated weights for policy 0, policy_version 18539 (0.0027) [2024-03-29 14:11:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 303874048. Throughput: 0: 42606.1. Samples: 186114720. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:11:43,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:11:44,411][00497] Updated weights for policy 0, policy_version 18549 (0.0021) [2024-03-29 14:11:48,284][00497] Updated weights for policy 0, policy_version 18559 (0.0019) [2024-03-29 14:11:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 304087040. Throughput: 0: 42069.9. Samples: 186222960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:11:48,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:11:52,122][00497] Updated weights for policy 0, policy_version 18569 (0.0024) [2024-03-29 14:11:53,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42871.5, 300 sec: 41931.9). Total num frames: 304300032. Throughput: 0: 42099.6. Samples: 186486820. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:11:53,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:11:56,510][00497] Updated weights for policy 0, policy_version 18579 (0.0026) [2024-03-29 14:11:58,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 304513024. Throughput: 0: 42592.3. Samples: 186755400. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 14:11:58,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 14:11:59,740][00497] Updated weights for policy 0, policy_version 18589 (0.0026) [2024-03-29 14:12:02,191][00476] Signal inference workers to stop experience collection... (6650 times) [2024-03-29 14:12:02,238][00497] InferenceWorker_p0-w0: stopping experience collection (6650 times) [2024-03-29 14:12:02,415][00476] Signal inference workers to resume experience collection... (6650 times) [2024-03-29 14:12:02,415][00497] InferenceWorker_p0-w0: resuming experience collection (6650 times) [2024-03-29 14:12:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 304709632. Throughput: 0: 42038.3. Samples: 186858000. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 14:12:03,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:12:03,905][00497] Updated weights for policy 0, policy_version 18599 (0.0022) [2024-03-29 14:12:07,738][00497] Updated weights for policy 0, policy_version 18609 (0.0025) [2024-03-29 14:12:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 304922624. Throughput: 0: 41940.3. Samples: 187116960. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 14:12:08,841][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:12:12,032][00497] Updated weights for policy 0, policy_version 18619 (0.0024) [2024-03-29 14:12:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 305135616. Throughput: 0: 42281.4. Samples: 187382740. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:12:13,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:12:15,267][00497] Updated weights for policy 0, policy_version 18629 (0.0034) [2024-03-29 14:12:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 305348608. Throughput: 0: 42064.1. Samples: 187491020. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:12:18,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:12:19,398][00497] Updated weights for policy 0, policy_version 18639 (0.0019) [2024-03-29 14:12:23,267][00497] Updated weights for policy 0, policy_version 18649 (0.0030) [2024-03-29 14:12:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 305561600. Throughput: 0: 42004.5. Samples: 187742180. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 14:12:23,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 14:12:27,645][00497] Updated weights for policy 0, policy_version 18659 (0.0018) [2024-03-29 14:12:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 305758208. Throughput: 0: 42287.2. Samples: 188017640. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 14:12:28,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:12:30,852][00497] Updated weights for policy 0, policy_version 18669 (0.0035) [2024-03-29 14:12:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 306003968. Throughput: 0: 42356.9. Samples: 188129020. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 14:12:33,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:12:34,809][00497] Updated weights for policy 0, policy_version 18679 (0.0025) [2024-03-29 14:12:38,750][00497] Updated weights for policy 0, policy_version 18689 (0.0027) [2024-03-29 14:12:38,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 306200576. Throughput: 0: 41899.6. Samples: 188372300. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:12:38,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 14:12:41,888][00476] Signal inference workers to stop experience collection... (6700 times) [2024-03-29 14:12:41,931][00497] InferenceWorker_p0-w0: stopping experience collection (6700 times) [2024-03-29 14:12:42,083][00476] Signal inference workers to resume experience collection... (6700 times) [2024-03-29 14:12:42,084][00497] InferenceWorker_p0-w0: resuming experience collection (6700 times) [2024-03-29 14:12:43,360][00497] Updated weights for policy 0, policy_version 18699 (0.0023) [2024-03-29 14:12:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 306380800. Throughput: 0: 41981.4. Samples: 188644560. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:12:43,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:12:46,585][00497] Updated weights for policy 0, policy_version 18709 (0.0027) [2024-03-29 14:12:48,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 306626560. Throughput: 0: 42041.7. Samples: 188749880. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:12:48,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:12:50,772][00497] Updated weights for policy 0, policy_version 18719 (0.0022) [2024-03-29 14:12:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 306823168. Throughput: 0: 41809.3. Samples: 188998380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 14:12:53,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:12:54,529][00497] Updated weights for policy 0, policy_version 18729 (0.0022) [2024-03-29 14:12:58,703][00497] Updated weights for policy 0, policy_version 18739 (0.0018) [2024-03-29 14:12:58,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 307019776. Throughput: 0: 42108.0. Samples: 189277600. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 14:12:58,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:13:01,823][00497] Updated weights for policy 0, policy_version 18749 (0.0027) [2024-03-29 14:13:03,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 307265536. Throughput: 0: 42450.2. Samples: 189401280. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 14:13:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:13:04,138][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018755_307281920.pth... [2024-03-29 14:13:04,483][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018137_297156608.pth [2024-03-29 14:13:06,243][00497] Updated weights for policy 0, policy_version 18759 (0.0021) [2024-03-29 14:13:08,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 307478528. Throughput: 0: 42075.6. Samples: 189635580. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 14:13:08,841][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:13:10,366][00497] Updated weights for policy 0, policy_version 18769 (0.0031) [2024-03-29 14:13:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 307642368. Throughput: 0: 41824.5. Samples: 189899740. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 14:13:13,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 14:13:14,384][00497] Updated weights for policy 0, policy_version 18779 (0.0022) [2024-03-29 14:13:14,760][00476] Signal inference workers to stop experience collection... (6750 times) [2024-03-29 14:13:14,839][00497] InferenceWorker_p0-w0: stopping experience collection (6750 times) [2024-03-29 14:13:14,840][00476] Signal inference workers to resume experience collection... (6750 times) [2024-03-29 14:13:14,863][00497] InferenceWorker_p0-w0: resuming experience collection (6750 times) [2024-03-29 14:13:17,603][00497] Updated weights for policy 0, policy_version 18789 (0.0026) [2024-03-29 14:13:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 307888128. Throughput: 0: 42215.9. Samples: 190028740. Policy #0 lag: (min: 0.0, avg: 19.4, max: 43.0) [2024-03-29 14:13:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:13:21,761][00497] Updated weights for policy 0, policy_version 18799 (0.0029) [2024-03-29 14:13:23,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 308101120. Throughput: 0: 42359.5. Samples: 190278480. Policy #0 lag: (min: 0.0, avg: 19.4, max: 43.0) [2024-03-29 14:13:23,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 14:13:26,027][00497] Updated weights for policy 0, policy_version 18809 (0.0018) [2024-03-29 14:13:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 308264960. Throughput: 0: 41806.3. Samples: 190525840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 43.0) [2024-03-29 14:13:28,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 14:13:29,905][00497] Updated weights for policy 0, policy_version 18819 (0.0022) [2024-03-29 14:13:33,356][00497] Updated weights for policy 0, policy_version 18829 (0.0019) [2024-03-29 14:13:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 308510720. Throughput: 0: 42291.2. Samples: 190652980. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 14:13:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:13:37,261][00497] Updated weights for policy 0, policy_version 18839 (0.0022) [2024-03-29 14:13:38,839][00126] Fps is (10 sec: 45874.4, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 308723712. Throughput: 0: 42292.9. Samples: 190901560. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 14:13:38,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:13:41,504][00497] Updated weights for policy 0, policy_version 18849 (0.0020) [2024-03-29 14:13:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 308903936. Throughput: 0: 41776.9. Samples: 191157560. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 14:13:43,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 14:13:45,666][00497] Updated weights for policy 0, policy_version 18859 (0.0022) [2024-03-29 14:13:48,311][00476] Signal inference workers to stop experience collection... (6800 times) [2024-03-29 14:13:48,340][00497] InferenceWorker_p0-w0: stopping experience collection (6800 times) [2024-03-29 14:13:48,495][00476] Signal inference workers to resume experience collection... (6800 times) [2024-03-29 14:13:48,496][00497] InferenceWorker_p0-w0: resuming experience collection (6800 times) [2024-03-29 14:13:48,801][00497] Updated weights for policy 0, policy_version 18869 (0.0024) [2024-03-29 14:13:48,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 309149696. Throughput: 0: 41992.5. Samples: 191290940. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 14:13:48,840][00126] Avg episode reward: [(0, '0.288')] [2024-03-29 14:13:52,849][00497] Updated weights for policy 0, policy_version 18879 (0.0029) [2024-03-29 14:13:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 309346304. Throughput: 0: 42016.8. Samples: 191526340. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 14:13:53,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:13:57,032][00497] Updated weights for policy 0, policy_version 18889 (0.0022) [2024-03-29 14:13:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 309542912. Throughput: 0: 41970.2. Samples: 191788400. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 14:13:58,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 14:14:01,067][00497] Updated weights for policy 0, policy_version 18899 (0.0017) [2024-03-29 14:14:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 309772288. Throughput: 0: 42194.1. Samples: 191927480. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 14:14:03,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:14:04,206][00497] Updated weights for policy 0, policy_version 18909 (0.0025) [2024-03-29 14:14:08,371][00497] Updated weights for policy 0, policy_version 18919 (0.0023) [2024-03-29 14:14:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 309985280. Throughput: 0: 41948.9. Samples: 192166180. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 14:14:08,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 14:14:12,555][00497] Updated weights for policy 0, policy_version 18929 (0.0027) [2024-03-29 14:14:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 310181888. Throughput: 0: 42201.2. Samples: 192424900. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:14:13,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 14:14:16,471][00497] Updated weights for policy 0, policy_version 18939 (0.0021) [2024-03-29 14:14:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 42098.6). Total num frames: 310394880. Throughput: 0: 42394.7. Samples: 192560740. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:14:18,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 14:14:19,938][00497] Updated weights for policy 0, policy_version 18949 (0.0033) [2024-03-29 14:14:23,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 310624256. Throughput: 0: 42055.3. Samples: 192794040. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:14:23,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:14:23,844][00497] Updated weights for policy 0, policy_version 18959 (0.0018) [2024-03-29 14:14:25,081][00476] Signal inference workers to stop experience collection... (6850 times) [2024-03-29 14:14:25,122][00497] InferenceWorker_p0-w0: stopping experience collection (6850 times) [2024-03-29 14:14:25,304][00476] Signal inference workers to resume experience collection... (6850 times) [2024-03-29 14:14:25,305][00497] InferenceWorker_p0-w0: resuming experience collection (6850 times) [2024-03-29 14:14:27,932][00497] Updated weights for policy 0, policy_version 18969 (0.0022) [2024-03-29 14:14:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 310804480. Throughput: 0: 42276.9. Samples: 193060020. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:14:28,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 14:14:31,972][00497] Updated weights for policy 0, policy_version 18979 (0.0023) [2024-03-29 14:14:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 311033856. Throughput: 0: 42367.5. Samples: 193197480. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:14:33,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:14:35,330][00497] Updated weights for policy 0, policy_version 18989 (0.0022) [2024-03-29 14:14:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 311246848. Throughput: 0: 42258.3. Samples: 193427960. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 14:14:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:14:39,226][00497] Updated weights for policy 0, policy_version 18999 (0.0023) [2024-03-29 14:14:42,998][00497] Updated weights for policy 0, policy_version 19009 (0.0026) [2024-03-29 14:14:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 311459840. Throughput: 0: 42467.0. Samples: 193699420. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 14:14:43,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:14:47,511][00497] Updated weights for policy 0, policy_version 19019 (0.0018) [2024-03-29 14:14:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 311672832. Throughput: 0: 42292.1. Samples: 193830620. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 14:14:48,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 14:14:50,633][00497] Updated weights for policy 0, policy_version 19029 (0.0024) [2024-03-29 14:14:53,841][00126] Fps is (10 sec: 42593.0, 60 sec: 42324.4, 300 sec: 42209.4). Total num frames: 311885824. Throughput: 0: 42396.5. Samples: 194074080. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 14:14:53,842][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 14:14:54,488][00497] Updated weights for policy 0, policy_version 19039 (0.0024) [2024-03-29 14:14:58,696][00497] Updated weights for policy 0, policy_version 19049 (0.0023) [2024-03-29 14:14:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 312098816. Throughput: 0: 42297.4. Samples: 194328280. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 14:14:58,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:14:59,928][00476] Signal inference workers to stop experience collection... (6900 times) [2024-03-29 14:14:59,982][00497] InferenceWorker_p0-w0: stopping experience collection (6900 times) [2024-03-29 14:15:00,116][00476] Signal inference workers to resume experience collection... (6900 times) [2024-03-29 14:15:00,117][00497] InferenceWorker_p0-w0: resuming experience collection (6900 times) [2024-03-29 14:15:03,134][00497] Updated weights for policy 0, policy_version 19059 (0.0019) [2024-03-29 14:15:03,839][00126] Fps is (10 sec: 40965.7, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 312295424. Throughput: 0: 42351.5. Samples: 194466560. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 14:15:03,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:15:03,969][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019062_312311808.pth... [2024-03-29 14:15:04,361][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018446_302219264.pth [2024-03-29 14:15:06,390][00497] Updated weights for policy 0, policy_version 19069 (0.0024) [2024-03-29 14:15:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 312524800. Throughput: 0: 41940.0. Samples: 194681340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 14:15:08,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:15:10,563][00497] Updated weights for policy 0, policy_version 19079 (0.0023) [2024-03-29 14:15:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 312721408. Throughput: 0: 41555.5. Samples: 194930020. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 14:15:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:15:14,842][00497] Updated weights for policy 0, policy_version 19089 (0.0023) [2024-03-29 14:15:18,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 312901632. Throughput: 0: 41781.8. Samples: 195077660. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 14:15:18,842][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:15:19,130][00497] Updated weights for policy 0, policy_version 19099 (0.0018) [2024-03-29 14:15:22,068][00497] Updated weights for policy 0, policy_version 19109 (0.0020) [2024-03-29 14:15:23,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 42209.6). Total num frames: 313163776. Throughput: 0: 42093.7. Samples: 195322180. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 14:15:23,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 14:15:26,304][00497] Updated weights for policy 0, policy_version 19119 (0.0029) [2024-03-29 14:15:28,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 313360384. Throughput: 0: 41234.7. Samples: 195554980. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 14:15:28,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:15:30,544][00497] Updated weights for policy 0, policy_version 19129 (0.0020) [2024-03-29 14:15:33,679][00476] Signal inference workers to stop experience collection... (6950 times) [2024-03-29 14:15:33,715][00497] InferenceWorker_p0-w0: stopping experience collection (6950 times) [2024-03-29 14:15:33,839][00126] Fps is (10 sec: 36045.3, 60 sec: 41506.2, 300 sec: 42154.1). Total num frames: 313524224. Throughput: 0: 41453.8. Samples: 195696040. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:15:33,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:15:33,871][00476] Signal inference workers to resume experience collection... (6950 times) [2024-03-29 14:15:33,872][00497] InferenceWorker_p0-w0: resuming experience collection (6950 times) [2024-03-29 14:15:34,806][00497] Updated weights for policy 0, policy_version 19139 (0.0023) [2024-03-29 14:15:37,957][00497] Updated weights for policy 0, policy_version 19149 (0.0032) [2024-03-29 14:15:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 313769984. Throughput: 0: 41541.6. Samples: 195943400. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:15:38,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 14:15:41,929][00497] Updated weights for policy 0, policy_version 19159 (0.0023) [2024-03-29 14:15:43,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 313999360. Throughput: 0: 41598.3. Samples: 196200200. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:15:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:15:46,131][00497] Updated weights for policy 0, policy_version 19169 (0.0026) [2024-03-29 14:15:48,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.1, 300 sec: 42098.6). Total num frames: 314146816. Throughput: 0: 41379.1. Samples: 196328620. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 14:15:48,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 14:15:50,522][00497] Updated weights for policy 0, policy_version 19179 (0.0029) [2024-03-29 14:15:53,565][00497] Updated weights for policy 0, policy_version 19189 (0.0025) [2024-03-29 14:15:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41780.1, 300 sec: 42043.0). Total num frames: 314392576. Throughput: 0: 42102.7. Samples: 196575960. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 14:15:53,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:15:57,651][00497] Updated weights for policy 0, policy_version 19199 (0.0028) [2024-03-29 14:15:58,839][00126] Fps is (10 sec: 45874.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 314605568. Throughput: 0: 42037.3. Samples: 196821700. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:15:58,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:16:01,981][00497] Updated weights for policy 0, policy_version 19209 (0.0024) [2024-03-29 14:16:03,159][00476] Signal inference workers to stop experience collection... (7000 times) [2024-03-29 14:16:03,160][00476] Signal inference workers to resume experience collection... (7000 times) [2024-03-29 14:16:03,198][00497] InferenceWorker_p0-w0: stopping experience collection (7000 times) [2024-03-29 14:16:03,198][00497] InferenceWorker_p0-w0: resuming experience collection (7000 times) [2024-03-29 14:16:03,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.0, 300 sec: 42098.5). Total num frames: 314785792. Throughput: 0: 41567.0. Samples: 196948180. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:16:03,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:16:06,133][00497] Updated weights for policy 0, policy_version 19219 (0.0025) [2024-03-29 14:16:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 315015168. Throughput: 0: 41757.0. Samples: 197201240. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:16:08,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:16:09,188][00497] Updated weights for policy 0, policy_version 19229 (0.0033) [2024-03-29 14:16:13,352][00497] Updated weights for policy 0, policy_version 19239 (0.0027) [2024-03-29 14:16:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 315228160. Throughput: 0: 42164.8. Samples: 197452400. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 14:16:13,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 14:16:17,816][00497] Updated weights for policy 0, policy_version 19249 (0.0032) [2024-03-29 14:16:18,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 315408384. Throughput: 0: 41561.6. Samples: 197566320. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 14:16:18,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:16:21,701][00497] Updated weights for policy 0, policy_version 19259 (0.0019) [2024-03-29 14:16:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 315637760. Throughput: 0: 42163.1. Samples: 197840740. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 14:16:23,842][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:16:24,982][00497] Updated weights for policy 0, policy_version 19269 (0.0020) [2024-03-29 14:16:28,839][00126] Fps is (10 sec: 44237.6, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 315850752. Throughput: 0: 41829.8. Samples: 198082540. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 14:16:28,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:16:28,916][00497] Updated weights for policy 0, policy_version 19279 (0.0021) [2024-03-29 14:16:33,230][00497] Updated weights for policy 0, policy_version 19289 (0.0027) [2024-03-29 14:16:33,552][00476] Signal inference workers to stop experience collection... (7050 times) [2024-03-29 14:16:33,552][00476] Signal inference workers to resume experience collection... (7050 times) [2024-03-29 14:16:33,594][00497] InferenceWorker_p0-w0: stopping experience collection (7050 times) [2024-03-29 14:16:33,594][00497] InferenceWorker_p0-w0: resuming experience collection (7050 times) [2024-03-29 14:16:33,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 316063744. Throughput: 0: 41496.5. Samples: 198195960. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 14:16:33,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:16:37,096][00497] Updated weights for policy 0, policy_version 19299 (0.0027) [2024-03-29 14:16:38,839][00126] Fps is (10 sec: 40959.2, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 316260352. Throughput: 0: 42044.7. Samples: 198467980. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:16:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:16:40,752][00497] Updated weights for policy 0, policy_version 19309 (0.0031) [2024-03-29 14:16:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 316489728. Throughput: 0: 41849.8. Samples: 198704940. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:16:43,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:16:44,520][00497] Updated weights for policy 0, policy_version 19319 (0.0021) [2024-03-29 14:16:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 316669952. Throughput: 0: 41981.3. Samples: 198837340. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:16:48,841][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 14:16:48,989][00497] Updated weights for policy 0, policy_version 19329 (0.0021) [2024-03-29 14:16:52,814][00497] Updated weights for policy 0, policy_version 19339 (0.0018) [2024-03-29 14:16:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 316899328. Throughput: 0: 42264.5. Samples: 199103140. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 14:16:53,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:16:56,145][00497] Updated weights for policy 0, policy_version 19349 (0.0025) [2024-03-29 14:16:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 317112320. Throughput: 0: 41797.4. Samples: 199333280. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 14:16:58,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 14:17:00,078][00497] Updated weights for policy 0, policy_version 19359 (0.0021) [2024-03-29 14:17:03,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 317308928. Throughput: 0: 42445.9. Samples: 199476380. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 14:17:03,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 14:17:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019367_317308928.pth... [2024-03-29 14:17:04,161][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018755_307281920.pth [2024-03-29 14:17:04,762][00497] Updated weights for policy 0, policy_version 19369 (0.0022) [2024-03-29 14:17:05,054][00476] Signal inference workers to stop experience collection... (7100 times) [2024-03-29 14:17:05,054][00476] Signal inference workers to resume experience collection... (7100 times) [2024-03-29 14:17:05,092][00497] InferenceWorker_p0-w0: stopping experience collection (7100 times) [2024-03-29 14:17:05,092][00497] InferenceWorker_p0-w0: resuming experience collection (7100 times) [2024-03-29 14:17:08,303][00497] Updated weights for policy 0, policy_version 19379 (0.0022) [2024-03-29 14:17:08,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 317521920. Throughput: 0: 42073.0. Samples: 199734020. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:17:08,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:17:11,793][00497] Updated weights for policy 0, policy_version 19389 (0.0024) [2024-03-29 14:17:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 317751296. Throughput: 0: 41470.6. Samples: 199948720. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:17:13,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:17:16,094][00497] Updated weights for policy 0, policy_version 19399 (0.0028) [2024-03-29 14:17:18,839][00126] Fps is (10 sec: 44235.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 317964288. Throughput: 0: 42089.1. Samples: 200089980. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:17:18,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:17:20,498][00497] Updated weights for policy 0, policy_version 19409 (0.0024) [2024-03-29 14:17:23,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 318128128. Throughput: 0: 41936.5. Samples: 200355120. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:17:23,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:17:24,193][00497] Updated weights for policy 0, policy_version 19419 (0.0021) [2024-03-29 14:17:27,630][00497] Updated weights for policy 0, policy_version 19429 (0.0028) [2024-03-29 14:17:28,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 318390272. Throughput: 0: 41660.9. Samples: 200579680. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:17:28,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 14:17:31,743][00497] Updated weights for policy 0, policy_version 19439 (0.0022) [2024-03-29 14:17:33,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 318586880. Throughput: 0: 41604.0. Samples: 200709520. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:17:33,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 14:17:36,135][00497] Updated weights for policy 0, policy_version 19449 (0.0022) [2024-03-29 14:17:37,406][00476] Signal inference workers to stop experience collection... (7150 times) [2024-03-29 14:17:37,406][00476] Signal inference workers to resume experience collection... (7150 times) [2024-03-29 14:17:37,450][00497] InferenceWorker_p0-w0: stopping experience collection (7150 times) [2024-03-29 14:17:37,450][00497] InferenceWorker_p0-w0: resuming experience collection (7150 times) [2024-03-29 14:17:38,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 318750720. Throughput: 0: 41609.6. Samples: 200975580. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 14:17:38,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:17:39,902][00497] Updated weights for policy 0, policy_version 19459 (0.0023) [2024-03-29 14:17:43,441][00497] Updated weights for policy 0, policy_version 19469 (0.0020) [2024-03-29 14:17:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 318996480. Throughput: 0: 41663.2. Samples: 201208120. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 14:17:43,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:17:47,301][00497] Updated weights for policy 0, policy_version 19479 (0.0025) [2024-03-29 14:17:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 319193088. Throughput: 0: 41231.5. Samples: 201331800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 14:17:48,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 14:17:52,080][00497] Updated weights for policy 0, policy_version 19489 (0.0027) [2024-03-29 14:17:53,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 319389696. Throughput: 0: 41562.5. Samples: 201604340. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 14:17:53,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:17:55,622][00497] Updated weights for policy 0, policy_version 19499 (0.0027) [2024-03-29 14:17:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 319619072. Throughput: 0: 42018.2. Samples: 201839540. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 14:17:58,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:17:59,001][00497] Updated weights for policy 0, policy_version 19509 (0.0024) [2024-03-29 14:18:03,161][00497] Updated weights for policy 0, policy_version 19519 (0.0028) [2024-03-29 14:18:03,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 319815680. Throughput: 0: 41345.9. Samples: 201950540. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 14:18:03,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:18:05,934][00476] Signal inference workers to stop experience collection... (7200 times) [2024-03-29 14:18:06,010][00497] InferenceWorker_p0-w0: stopping experience collection (7200 times) [2024-03-29 14:18:06,011][00476] Signal inference workers to resume experience collection... (7200 times) [2024-03-29 14:18:06,033][00497] InferenceWorker_p0-w0: resuming experience collection (7200 times) [2024-03-29 14:18:08,149][00497] Updated weights for policy 0, policy_version 19529 (0.0024) [2024-03-29 14:18:08,839][00126] Fps is (10 sec: 37682.6, 60 sec: 41232.9, 300 sec: 41876.4). Total num frames: 319995904. Throughput: 0: 41270.6. Samples: 202212300. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 14:18:08,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:18:11,712][00497] Updated weights for policy 0, policy_version 19539 (0.0022) [2024-03-29 14:18:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 320208896. Throughput: 0: 41707.1. Samples: 202456500. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 14:18:13,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:18:14,874][00497] Updated weights for policy 0, policy_version 19549 (0.0030) [2024-03-29 14:18:18,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 320438272. Throughput: 0: 41204.9. Samples: 202563740. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:18:18,841][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:18:18,903][00497] Updated weights for policy 0, policy_version 19559 (0.0020) [2024-03-29 14:18:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 320602112. Throughput: 0: 41059.6. Samples: 202823260. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:18:23,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:18:23,896][00497] Updated weights for policy 0, policy_version 19569 (0.0023) [2024-03-29 14:18:27,386][00497] Updated weights for policy 0, policy_version 19579 (0.0029) [2024-03-29 14:18:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40959.9, 300 sec: 41820.8). Total num frames: 320847872. Throughput: 0: 41744.4. Samples: 203086620. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:18:28,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:18:30,739][00497] Updated weights for policy 0, policy_version 19589 (0.0032) [2024-03-29 14:18:33,839][00126] Fps is (10 sec: 45875.3, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 321060864. Throughput: 0: 41345.8. Samples: 203192360. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 14:18:33,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:18:34,593][00497] Updated weights for policy 0, policy_version 19599 (0.0020) [2024-03-29 14:18:36,354][00476] Signal inference workers to stop experience collection... (7250 times) [2024-03-29 14:18:36,399][00497] InferenceWorker_p0-w0: stopping experience collection (7250 times) [2024-03-29 14:18:36,577][00476] Signal inference workers to resume experience collection... (7250 times) [2024-03-29 14:18:36,578][00497] InferenceWorker_p0-w0: resuming experience collection (7250 times) [2024-03-29 14:18:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 321241088. Throughput: 0: 41049.9. Samples: 203451580. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 14:18:38,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:18:39,400][00497] Updated weights for policy 0, policy_version 19609 (0.0026) [2024-03-29 14:18:43,086][00497] Updated weights for policy 0, policy_version 19619 (0.0021) [2024-03-29 14:18:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 321470464. Throughput: 0: 41790.7. Samples: 203720120. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 14:18:43,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:18:46,420][00497] Updated weights for policy 0, policy_version 19629 (0.0027) [2024-03-29 14:18:48,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 321699840. Throughput: 0: 41576.4. Samples: 203821480. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 14:18:48,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:18:50,491][00497] Updated weights for policy 0, policy_version 19639 (0.0026) [2024-03-29 14:18:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 321896448. Throughput: 0: 41749.9. Samples: 204091040. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 14:18:53,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:18:55,397][00497] Updated weights for policy 0, policy_version 19649 (0.0021) [2024-03-29 14:18:58,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41709.8). Total num frames: 322076672. Throughput: 0: 41797.3. Samples: 204337380. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:18:58,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 14:18:58,939][00497] Updated weights for policy 0, policy_version 19659 (0.0030) [2024-03-29 14:19:02,040][00497] Updated weights for policy 0, policy_version 19669 (0.0022) [2024-03-29 14:19:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 322322432. Throughput: 0: 42181.3. Samples: 204461900. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:19:03,841][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:19:03,948][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019674_322338816.pth... [2024-03-29 14:19:04,269][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019062_312311808.pth [2024-03-29 14:19:06,191][00497] Updated weights for policy 0, policy_version 19679 (0.0021) [2024-03-29 14:19:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 322519040. Throughput: 0: 41859.0. Samples: 204706920. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:19:08,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:19:11,050][00476] Signal inference workers to stop experience collection... (7300 times) [2024-03-29 14:19:11,093][00497] InferenceWorker_p0-w0: stopping experience collection (7300 times) [2024-03-29 14:19:11,276][00476] Signal inference workers to resume experience collection... (7300 times) [2024-03-29 14:19:11,277][00497] InferenceWorker_p0-w0: resuming experience collection (7300 times) [2024-03-29 14:19:11,280][00497] Updated weights for policy 0, policy_version 19689 (0.0029) [2024-03-29 14:19:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 322699264. Throughput: 0: 41810.2. Samples: 204968080. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 14:19:13,842][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 14:19:14,692][00497] Updated weights for policy 0, policy_version 19699 (0.0025) [2024-03-29 14:19:17,861][00497] Updated weights for policy 0, policy_version 19709 (0.0024) [2024-03-29 14:19:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 322961408. Throughput: 0: 42003.5. Samples: 205082520. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 14:19:18,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:19:21,953][00497] Updated weights for policy 0, policy_version 19719 (0.0020) [2024-03-29 14:19:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 323141632. Throughput: 0: 41538.2. Samples: 205320800. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 14:19:23,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 14:19:26,817][00497] Updated weights for policy 0, policy_version 19729 (0.0017) [2024-03-29 14:19:28,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 323321856. Throughput: 0: 41592.0. Samples: 205591760. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 14:19:28,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:19:30,333][00497] Updated weights for policy 0, policy_version 19739 (0.0017) [2024-03-29 14:19:33,556][00497] Updated weights for policy 0, policy_version 19749 (0.0020) [2024-03-29 14:19:33,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 323584000. Throughput: 0: 41849.0. Samples: 205704680. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 14:19:33,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:19:37,595][00497] Updated weights for policy 0, policy_version 19759 (0.0024) [2024-03-29 14:19:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 323764224. Throughput: 0: 41284.0. Samples: 205948820. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 14:19:38,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 14:19:42,157][00497] Updated weights for policy 0, policy_version 19769 (0.0021) [2024-03-29 14:19:43,839][00126] Fps is (10 sec: 36044.5, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 323944448. Throughput: 0: 41817.7. Samples: 206219180. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:19:43,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:19:45,155][00476] Signal inference workers to stop experience collection... (7350 times) [2024-03-29 14:19:45,203][00497] InferenceWorker_p0-w0: stopping experience collection (7350 times) [2024-03-29 14:19:45,240][00476] Signal inference workers to resume experience collection... (7350 times) [2024-03-29 14:19:45,243][00497] InferenceWorker_p0-w0: resuming experience collection (7350 times) [2024-03-29 14:19:45,799][00497] Updated weights for policy 0, policy_version 19779 (0.0025) [2024-03-29 14:19:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.2, 300 sec: 41765.5). Total num frames: 324206592. Throughput: 0: 41804.9. Samples: 206343120. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:19:48,842][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 14:19:49,045][00497] Updated weights for policy 0, policy_version 19789 (0.0024) [2024-03-29 14:19:53,162][00497] Updated weights for policy 0, policy_version 19799 (0.0026) [2024-03-29 14:19:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 324386816. Throughput: 0: 41645.5. Samples: 206580960. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 14:19:53,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:19:58,052][00497] Updated weights for policy 0, policy_version 19809 (0.0020) [2024-03-29 14:19:58,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 324583424. Throughput: 0: 41817.4. Samples: 206849860. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 14:19:58,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:20:01,537][00497] Updated weights for policy 0, policy_version 19819 (0.0026) [2024-03-29 14:20:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 324812800. Throughput: 0: 41977.5. Samples: 206971500. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 14:20:03,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:20:04,751][00497] Updated weights for policy 0, policy_version 19829 (0.0022) [2024-03-29 14:20:08,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 325025792. Throughput: 0: 41720.0. Samples: 207198200. Policy #0 lag: (min: 0.0, avg: 23.5, max: 41.0) [2024-03-29 14:20:08,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:20:09,028][00497] Updated weights for policy 0, policy_version 19839 (0.0019) [2024-03-29 14:20:13,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 325189632. Throughput: 0: 41665.3. Samples: 207466700. Policy #0 lag: (min: 0.0, avg: 23.5, max: 41.0) [2024-03-29 14:20:13,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 14:20:13,968][00497] Updated weights for policy 0, policy_version 19849 (0.0029) [2024-03-29 14:20:17,314][00497] Updated weights for policy 0, policy_version 19859 (0.0024) [2024-03-29 14:20:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 325435392. Throughput: 0: 42122.6. Samples: 207600200. Policy #0 lag: (min: 0.0, avg: 23.5, max: 41.0) [2024-03-29 14:20:18,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:20:19,350][00476] Signal inference workers to stop experience collection... (7400 times) [2024-03-29 14:20:19,391][00497] InferenceWorker_p0-w0: stopping experience collection (7400 times) [2024-03-29 14:20:19,580][00476] Signal inference workers to resume experience collection... (7400 times) [2024-03-29 14:20:19,580][00497] InferenceWorker_p0-w0: resuming experience collection (7400 times) [2024-03-29 14:20:20,427][00497] Updated weights for policy 0, policy_version 19869 (0.0025) [2024-03-29 14:20:23,839][00126] Fps is (10 sec: 47513.9, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 325664768. Throughput: 0: 41729.3. Samples: 207826640. Policy #0 lag: (min: 1.0, avg: 24.0, max: 43.0) [2024-03-29 14:20:23,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:20:24,378][00497] Updated weights for policy 0, policy_version 19879 (0.0022) [2024-03-29 14:20:28,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 325844992. Throughput: 0: 41674.6. Samples: 208094540. Policy #0 lag: (min: 1.0, avg: 24.0, max: 43.0) [2024-03-29 14:20:28,841][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:20:29,311][00497] Updated weights for policy 0, policy_version 19889 (0.0022) [2024-03-29 14:20:32,947][00497] Updated weights for policy 0, policy_version 19899 (0.0024) [2024-03-29 14:20:33,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 326057984. Throughput: 0: 42112.0. Samples: 208238160. Policy #0 lag: (min: 1.0, avg: 24.0, max: 43.0) [2024-03-29 14:20:33,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:20:35,983][00497] Updated weights for policy 0, policy_version 19909 (0.0022) [2024-03-29 14:20:38,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 326303744. Throughput: 0: 41571.0. Samples: 208451660. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:20:38,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:20:40,036][00497] Updated weights for policy 0, policy_version 19919 (0.0028) [2024-03-29 14:20:43,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 326483968. Throughput: 0: 41439.6. Samples: 208714640. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:20:43,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 14:20:44,791][00497] Updated weights for policy 0, policy_version 19929 (0.0020) [2024-03-29 14:20:48,563][00497] Updated weights for policy 0, policy_version 19939 (0.0023) [2024-03-29 14:20:48,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 326680576. Throughput: 0: 41830.5. Samples: 208853880. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:20:48,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:20:51,216][00476] Signal inference workers to stop experience collection... (7450 times) [2024-03-29 14:20:51,251][00497] InferenceWorker_p0-w0: stopping experience collection (7450 times) [2024-03-29 14:20:51,432][00476] Signal inference workers to resume experience collection... (7450 times) [2024-03-29 14:20:51,432][00497] InferenceWorker_p0-w0: resuming experience collection (7450 times) [2024-03-29 14:20:51,690][00497] Updated weights for policy 0, policy_version 19949 (0.0028) [2024-03-29 14:20:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.3, 300 sec: 41820.9). Total num frames: 326942720. Throughput: 0: 41884.1. Samples: 209082980. Policy #0 lag: (min: 2.0, avg: 19.5, max: 41.0) [2024-03-29 14:20:53,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:20:55,684][00497] Updated weights for policy 0, policy_version 19959 (0.0018) [2024-03-29 14:20:58,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 327139328. Throughput: 0: 41972.9. Samples: 209355480. Policy #0 lag: (min: 2.0, avg: 19.5, max: 41.0) [2024-03-29 14:20:58,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 14:21:00,396][00497] Updated weights for policy 0, policy_version 19969 (0.0030) [2024-03-29 14:21:03,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 327319552. Throughput: 0: 42019.0. Samples: 209491060. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 14:21:03,841][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 14:21:04,044][00497] Updated weights for policy 0, policy_version 19979 (0.0022) [2024-03-29 14:21:04,331][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019980_327352320.pth... [2024-03-29 14:21:04,669][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019367_317308928.pth [2024-03-29 14:21:07,247][00497] Updated weights for policy 0, policy_version 19989 (0.0022) [2024-03-29 14:21:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 327565312. Throughput: 0: 42077.2. Samples: 209720120. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 14:21:08,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 14:21:11,496][00497] Updated weights for policy 0, policy_version 19999 (0.0027) [2024-03-29 14:21:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.4, 300 sec: 41820.9). Total num frames: 327745536. Throughput: 0: 41854.7. Samples: 209978000. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 14:21:13,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:21:16,219][00497] Updated weights for policy 0, policy_version 20009 (0.0031) [2024-03-29 14:21:18,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 327925760. Throughput: 0: 41684.5. Samples: 210113960. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 14:21:18,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 14:21:19,918][00497] Updated weights for policy 0, policy_version 20019 (0.0019) [2024-03-29 14:21:23,077][00476] Signal inference workers to stop experience collection... (7500 times) [2024-03-29 14:21:23,131][00497] InferenceWorker_p0-w0: stopping experience collection (7500 times) [2024-03-29 14:21:23,165][00476] Signal inference workers to resume experience collection... (7500 times) [2024-03-29 14:21:23,167][00497] InferenceWorker_p0-w0: resuming experience collection (7500 times) [2024-03-29 14:21:23,172][00497] Updated weights for policy 0, policy_version 20029 (0.0034) [2024-03-29 14:21:23,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 328187904. Throughput: 0: 42279.1. Samples: 210354220. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 14:21:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:21:27,422][00497] Updated weights for policy 0, policy_version 20039 (0.0022) [2024-03-29 14:21:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 328368128. Throughput: 0: 41807.1. Samples: 210595960. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 14:21:28,841][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 14:21:31,912][00497] Updated weights for policy 0, policy_version 20049 (0.0019) [2024-03-29 14:21:33,839][00126] Fps is (10 sec: 34406.1, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 328531968. Throughput: 0: 41507.5. Samples: 210721720. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:21:33,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 14:21:35,687][00497] Updated weights for policy 0, policy_version 20059 (0.0025) [2024-03-29 14:21:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 328794112. Throughput: 0: 41975.9. Samples: 210971900. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:21:38,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:21:39,039][00497] Updated weights for policy 0, policy_version 20069 (0.0019) [2024-03-29 14:21:43,282][00497] Updated weights for policy 0, policy_version 20079 (0.0026) [2024-03-29 14:21:43,839][00126] Fps is (10 sec: 45875.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 328990720. Throughput: 0: 41283.1. Samples: 211213220. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:21:43,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:21:47,382][00497] Updated weights for policy 0, policy_version 20089 (0.0026) [2024-03-29 14:21:48,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 329187328. Throughput: 0: 40999.2. Samples: 211336020. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 14:21:48,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:21:51,437][00497] Updated weights for policy 0, policy_version 20099 (0.0023) [2024-03-29 14:21:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 329400320. Throughput: 0: 41796.9. Samples: 211600980. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 14:21:53,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:21:54,695][00497] Updated weights for policy 0, policy_version 20109 (0.0031) [2024-03-29 14:21:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 329613312. Throughput: 0: 41374.2. Samples: 211839840. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 14:21:58,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:21:58,950][00497] Updated weights for policy 0, policy_version 20119 (0.0019) [2024-03-29 14:21:59,467][00476] Signal inference workers to stop experience collection... (7550 times) [2024-03-29 14:21:59,510][00497] InferenceWorker_p0-w0: stopping experience collection (7550 times) [2024-03-29 14:21:59,547][00476] Signal inference workers to resume experience collection... (7550 times) [2024-03-29 14:21:59,575][00497] InferenceWorker_p0-w0: resuming experience collection (7550 times) [2024-03-29 14:22:03,151][00497] Updated weights for policy 0, policy_version 20129 (0.0026) [2024-03-29 14:22:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 329809920. Throughput: 0: 41197.7. Samples: 211967860. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 14:22:03,841][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:22:07,258][00497] Updated weights for policy 0, policy_version 20139 (0.0029) [2024-03-29 14:22:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 330022912. Throughput: 0: 41737.9. Samples: 212232420. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 14:22:08,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:22:10,454][00497] Updated weights for policy 0, policy_version 20149 (0.0030) [2024-03-29 14:22:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 330252288. Throughput: 0: 41320.4. Samples: 212455380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 14:22:13,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 14:22:14,797][00497] Updated weights for policy 0, policy_version 20159 (0.0021) [2024-03-29 14:22:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 330432512. Throughput: 0: 41494.0. Samples: 212588940. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 14:22:18,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:22:19,339][00497] Updated weights for policy 0, policy_version 20169 (0.0022) [2024-03-29 14:22:23,115][00497] Updated weights for policy 0, policy_version 20179 (0.0020) [2024-03-29 14:22:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.0, 300 sec: 41543.1). Total num frames: 330645504. Throughput: 0: 41900.5. Samples: 212857420. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 14:22:23,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:22:26,083][00497] Updated weights for policy 0, policy_version 20189 (0.0028) [2024-03-29 14:22:28,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 330891264. Throughput: 0: 41447.5. Samples: 213078360. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 14:22:28,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:22:30,367][00497] Updated weights for policy 0, policy_version 20199 (0.0018) [2024-03-29 14:22:32,053][00476] Signal inference workers to stop experience collection... (7600 times) [2024-03-29 14:22:32,095][00497] InferenceWorker_p0-w0: stopping experience collection (7600 times) [2024-03-29 14:22:32,133][00476] Signal inference workers to resume experience collection... (7600 times) [2024-03-29 14:22:32,135][00497] InferenceWorker_p0-w0: resuming experience collection (7600 times) [2024-03-29 14:22:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 331071488. Throughput: 0: 41879.5. Samples: 213220600. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 14:22:33,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:22:34,652][00497] Updated weights for policy 0, policy_version 20209 (0.0032) [2024-03-29 14:22:38,594][00497] Updated weights for policy 0, policy_version 20219 (0.0019) [2024-03-29 14:22:38,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 331268096. Throughput: 0: 41940.9. Samples: 213488320. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 14:22:38,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:22:41,992][00497] Updated weights for policy 0, policy_version 20229 (0.0026) [2024-03-29 14:22:43,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 331530240. Throughput: 0: 41526.2. Samples: 213708520. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 14:22:43,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:22:46,173][00497] Updated weights for policy 0, policy_version 20239 (0.0036) [2024-03-29 14:22:48,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 331726848. Throughput: 0: 41863.6. Samples: 213851720. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 14:22:48,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:22:50,061][00497] Updated weights for policy 0, policy_version 20249 (0.0018) [2024-03-29 14:22:53,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 331907072. Throughput: 0: 42114.2. Samples: 214127560. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 14:22:53,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 14:22:54,025][00497] Updated weights for policy 0, policy_version 20259 (0.0022) [2024-03-29 14:22:57,279][00497] Updated weights for policy 0, policy_version 20269 (0.0019) [2024-03-29 14:22:58,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 332169216. Throughput: 0: 42249.7. Samples: 214356620. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 14:22:58,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:23:01,564][00497] Updated weights for policy 0, policy_version 20279 (0.0024) [2024-03-29 14:23:01,895][00476] Signal inference workers to stop experience collection... (7650 times) [2024-03-29 14:23:01,920][00497] InferenceWorker_p0-w0: stopping experience collection (7650 times) [2024-03-29 14:23:02,115][00476] Signal inference workers to resume experience collection... (7650 times) [2024-03-29 14:23:02,115][00497] InferenceWorker_p0-w0: resuming experience collection (7650 times) [2024-03-29 14:23:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 41932.0). Total num frames: 332365824. Throughput: 0: 42195.1. Samples: 214487720. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 14:23:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:23:04,087][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020287_332382208.pth... [2024-03-29 14:23:04,387][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019674_322338816.pth [2024-03-29 14:23:05,552][00497] Updated weights for policy 0, policy_version 20289 (0.0024) [2024-03-29 14:23:08,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 332529664. Throughput: 0: 42188.5. Samples: 214755900. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 14:23:08,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:23:09,649][00497] Updated weights for policy 0, policy_version 20299 (0.0017) [2024-03-29 14:23:12,687][00497] Updated weights for policy 0, policy_version 20309 (0.0027) [2024-03-29 14:23:13,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 332791808. Throughput: 0: 42538.2. Samples: 214992580. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 14:23:13,840][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 14:23:16,712][00497] Updated weights for policy 0, policy_version 20319 (0.0021) [2024-03-29 14:23:18,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 332988416. Throughput: 0: 42314.8. Samples: 215124760. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 14:23:18,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:23:21,259][00497] Updated weights for policy 0, policy_version 20329 (0.0023) [2024-03-29 14:23:23,839][00126] Fps is (10 sec: 36045.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 333152256. Throughput: 0: 42008.5. Samples: 215378700. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 14:23:23,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:23:25,444][00497] Updated weights for policy 0, policy_version 20339 (0.0028) [2024-03-29 14:23:28,383][00497] Updated weights for policy 0, policy_version 20349 (0.0033) [2024-03-29 14:23:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 333414400. Throughput: 0: 42308.5. Samples: 215612400. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 14:23:28,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:23:32,107][00476] Signal inference workers to stop experience collection... (7700 times) [2024-03-29 14:23:32,185][00497] InferenceWorker_p0-w0: stopping experience collection (7700 times) [2024-03-29 14:23:32,273][00476] Signal inference workers to resume experience collection... (7700 times) [2024-03-29 14:23:32,274][00497] InferenceWorker_p0-w0: resuming experience collection (7700 times) [2024-03-29 14:23:32,582][00497] Updated weights for policy 0, policy_version 20359 (0.0025) [2024-03-29 14:23:33,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 333611008. Throughput: 0: 41860.4. Samples: 215735440. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 14:23:33,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:23:37,007][00497] Updated weights for policy 0, policy_version 20369 (0.0023) [2024-03-29 14:23:38,839][00126] Fps is (10 sec: 36044.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 333774848. Throughput: 0: 41559.5. Samples: 215997740. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 14:23:38,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:23:41,116][00497] Updated weights for policy 0, policy_version 20379 (0.0027) [2024-03-29 14:23:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 334020608. Throughput: 0: 42114.8. Samples: 216251780. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 14:23:43,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 14:23:44,182][00497] Updated weights for policy 0, policy_version 20389 (0.0029) [2024-03-29 14:23:48,255][00497] Updated weights for policy 0, policy_version 20399 (0.0031) [2024-03-29 14:23:48,839][00126] Fps is (10 sec: 45875.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 334233600. Throughput: 0: 41635.6. Samples: 216361320. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 14:23:48,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 14:23:52,447][00497] Updated weights for policy 0, policy_version 20409 (0.0023) [2024-03-29 14:23:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 334413824. Throughput: 0: 41678.2. Samples: 216631420. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 14:23:53,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:23:56,878][00497] Updated weights for policy 0, policy_version 20419 (0.0021) [2024-03-29 14:23:58,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 334643200. Throughput: 0: 42037.8. Samples: 216884280. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 14:23:58,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:23:59,899][00497] Updated weights for policy 0, policy_version 20429 (0.0030) [2024-03-29 14:24:03,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 334856192. Throughput: 0: 41167.9. Samples: 216977320. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 14:24:03,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:24:04,018][00497] Updated weights for policy 0, policy_version 20439 (0.0024) [2024-03-29 14:24:04,442][00476] Signal inference workers to stop experience collection... (7750 times) [2024-03-29 14:24:04,521][00497] InferenceWorker_p0-w0: stopping experience collection (7750 times) [2024-03-29 14:24:04,522][00476] Signal inference workers to resume experience collection... (7750 times) [2024-03-29 14:24:04,547][00497] InferenceWorker_p0-w0: resuming experience collection (7750 times) [2024-03-29 14:24:08,207][00497] Updated weights for policy 0, policy_version 20449 (0.0030) [2024-03-29 14:24:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 41932.0). Total num frames: 335069184. Throughput: 0: 41523.1. Samples: 217247240. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:24:08,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 14:24:12,453][00497] Updated weights for policy 0, policy_version 20459 (0.0023) [2024-03-29 14:24:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 335265792. Throughput: 0: 42301.2. Samples: 217515960. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:24:13,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:24:15,430][00497] Updated weights for policy 0, policy_version 20469 (0.0020) [2024-03-29 14:24:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 335495168. Throughput: 0: 41870.7. Samples: 217619620. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:24:18,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 14:24:19,574][00497] Updated weights for policy 0, policy_version 20479 (0.0025) [2024-03-29 14:24:23,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 335675392. Throughput: 0: 41617.4. Samples: 217870520. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:24:23,841][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 14:24:23,936][00497] Updated weights for policy 0, policy_version 20489 (0.0021) [2024-03-29 14:24:27,981][00497] Updated weights for policy 0, policy_version 20499 (0.0019) [2024-03-29 14:24:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 335888384. Throughput: 0: 42203.1. Samples: 218150920. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:24:28,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:24:30,900][00497] Updated weights for policy 0, policy_version 20509 (0.0023) [2024-03-29 14:24:31,592][00476] Signal inference workers to stop experience collection... (7800 times) [2024-03-29 14:24:31,656][00497] InferenceWorker_p0-w0: stopping experience collection (7800 times) [2024-03-29 14:24:31,754][00476] Signal inference workers to resume experience collection... (7800 times) [2024-03-29 14:24:31,755][00497] InferenceWorker_p0-w0: resuming experience collection (7800 times) [2024-03-29 14:24:33,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 336134144. Throughput: 0: 41990.6. Samples: 218250900. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:24:33,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 14:24:35,078][00497] Updated weights for policy 0, policy_version 20519 (0.0020) [2024-03-29 14:24:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 336314368. Throughput: 0: 41696.8. Samples: 218507780. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:24:38,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 14:24:39,300][00497] Updated weights for policy 0, policy_version 20529 (0.0034) [2024-03-29 14:24:43,658][00497] Updated weights for policy 0, policy_version 20539 (0.0029) [2024-03-29 14:24:43,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 336510976. Throughput: 0: 42188.0. Samples: 218782740. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:24:43,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:24:46,605][00497] Updated weights for policy 0, policy_version 20549 (0.0026) [2024-03-29 14:24:48,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 336773120. Throughput: 0: 42475.7. Samples: 218888720. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:24:48,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 14:24:50,864][00497] Updated weights for policy 0, policy_version 20559 (0.0026) [2024-03-29 14:24:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 336953344. Throughput: 0: 41881.2. Samples: 219131900. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 14:24:53,841][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 14:24:55,064][00497] Updated weights for policy 0, policy_version 20569 (0.0023) [2024-03-29 14:24:58,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 337133568. Throughput: 0: 42065.8. Samples: 219408920. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 14:24:58,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:24:59,377][00497] Updated weights for policy 0, policy_version 20579 (0.0022) [2024-03-29 14:25:02,275][00497] Updated weights for policy 0, policy_version 20589 (0.0030) [2024-03-29 14:25:03,704][00476] Signal inference workers to stop experience collection... (7850 times) [2024-03-29 14:25:03,704][00476] Signal inference workers to resume experience collection... (7850 times) [2024-03-29 14:25:03,738][00497] InferenceWorker_p0-w0: stopping experience collection (7850 times) [2024-03-29 14:25:03,738][00497] InferenceWorker_p0-w0: resuming experience collection (7850 times) [2024-03-29 14:25:03,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 337412096. Throughput: 0: 42192.4. Samples: 219518280. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 14:25:03,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:25:04,003][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020595_337428480.pth... [2024-03-29 14:25:04,316][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019980_327352320.pth [2024-03-29 14:25:06,440][00497] Updated weights for policy 0, policy_version 20599 (0.0028) [2024-03-29 14:25:08,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 337608704. Throughput: 0: 42177.6. Samples: 219768520. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 14:25:08,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 14:25:10,571][00497] Updated weights for policy 0, policy_version 20609 (0.0024) [2024-03-29 14:25:13,839][00126] Fps is (10 sec: 34406.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 337756160. Throughput: 0: 41976.9. Samples: 220039880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 14:25:13,840][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 14:25:15,019][00497] Updated weights for policy 0, policy_version 20619 (0.0033) [2024-03-29 14:25:18,102][00497] Updated weights for policy 0, policy_version 20629 (0.0021) [2024-03-29 14:25:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 338018304. Throughput: 0: 42191.1. Samples: 220149500. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 14:25:18,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:25:22,316][00497] Updated weights for policy 0, policy_version 20639 (0.0019) [2024-03-29 14:25:23,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.4, 300 sec: 41932.0). Total num frames: 338214912. Throughput: 0: 41765.9. Samples: 220387240. Policy #0 lag: (min: 0.0, avg: 23.4, max: 41.0) [2024-03-29 14:25:23,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:25:26,696][00497] Updated weights for policy 0, policy_version 20649 (0.0019) [2024-03-29 14:25:28,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 338378752. Throughput: 0: 41630.3. Samples: 220656100. Policy #0 lag: (min: 0.0, avg: 23.4, max: 41.0) [2024-03-29 14:25:28,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:25:30,690][00497] Updated weights for policy 0, policy_version 20659 (0.0020) [2024-03-29 14:25:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 338624512. Throughput: 0: 41952.0. Samples: 220776560. Policy #0 lag: (min: 0.0, avg: 23.4, max: 41.0) [2024-03-29 14:25:33,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:25:33,872][00497] Updated weights for policy 0, policy_version 20669 (0.0038) [2024-03-29 14:25:38,191][00497] Updated weights for policy 0, policy_version 20679 (0.0023) [2024-03-29 14:25:38,492][00476] Signal inference workers to stop experience collection... (7900 times) [2024-03-29 14:25:38,494][00476] Signal inference workers to resume experience collection... (7900 times) [2024-03-29 14:25:38,538][00497] InferenceWorker_p0-w0: stopping experience collection (7900 times) [2024-03-29 14:25:38,538][00497] InferenceWorker_p0-w0: resuming experience collection (7900 times) [2024-03-29 14:25:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 338837504. Throughput: 0: 41585.8. Samples: 221003260. Policy #0 lag: (min: 0.0, avg: 24.0, max: 43.0) [2024-03-29 14:25:38,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:25:42,360][00497] Updated weights for policy 0, policy_version 20689 (0.0022) [2024-03-29 14:25:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 339001344. Throughput: 0: 41388.5. Samples: 221271400. Policy #0 lag: (min: 0.0, avg: 24.0, max: 43.0) [2024-03-29 14:25:43,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:25:46,402][00497] Updated weights for policy 0, policy_version 20699 (0.0021) [2024-03-29 14:25:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 339247104. Throughput: 0: 41951.1. Samples: 221406080. Policy #0 lag: (min: 0.0, avg: 24.0, max: 43.0) [2024-03-29 14:25:48,840][00126] Avg episode reward: [(0, '0.235')] [2024-03-29 14:25:49,758][00497] Updated weights for policy 0, policy_version 20709 (0.0024) [2024-03-29 14:25:53,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 339443712. Throughput: 0: 41040.0. Samples: 221615320. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 14:25:53,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:25:54,014][00497] Updated weights for policy 0, policy_version 20719 (0.0019) [2024-03-29 14:25:58,231][00497] Updated weights for policy 0, policy_version 20729 (0.0020) [2024-03-29 14:25:58,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 339640320. Throughput: 0: 41046.7. Samples: 221886980. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 14:25:58,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:26:02,377][00497] Updated weights for policy 0, policy_version 20739 (0.0022) [2024-03-29 14:26:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40686.9, 300 sec: 41654.2). Total num frames: 339853312. Throughput: 0: 41697.3. Samples: 222025880. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 14:26:03,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:26:05,646][00497] Updated weights for policy 0, policy_version 20749 (0.0024) [2024-03-29 14:26:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 340082688. Throughput: 0: 41017.7. Samples: 222233040. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 14:26:08,841][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:26:09,673][00497] Updated weights for policy 0, policy_version 20759 (0.0031) [2024-03-29 14:26:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 340246528. Throughput: 0: 40903.5. Samples: 222496760. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 14:26:13,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:26:14,182][00497] Updated weights for policy 0, policy_version 20769 (0.0021) [2024-03-29 14:26:17,270][00476] Signal inference workers to stop experience collection... (7950 times) [2024-03-29 14:26:17,315][00497] InferenceWorker_p0-w0: stopping experience collection (7950 times) [2024-03-29 14:26:17,468][00476] Signal inference workers to resume experience collection... (7950 times) [2024-03-29 14:26:17,469][00497] InferenceWorker_p0-w0: resuming experience collection (7950 times) [2024-03-29 14:26:18,247][00497] Updated weights for policy 0, policy_version 20779 (0.0023) [2024-03-29 14:26:18,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 340459520. Throughput: 0: 41474.2. Samples: 222642900. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 14:26:18,840][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 14:26:21,403][00497] Updated weights for policy 0, policy_version 20789 (0.0025) [2024-03-29 14:26:23,839][00126] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 340705280. Throughput: 0: 40993.9. Samples: 222847980. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 14:26:23,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:26:25,627][00497] Updated weights for policy 0, policy_version 20799 (0.0033) [2024-03-29 14:26:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 340885504. Throughput: 0: 40682.1. Samples: 223102100. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 14:26:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:26:30,078][00497] Updated weights for policy 0, policy_version 20809 (0.0021) [2024-03-29 14:26:33,839][00126] Fps is (10 sec: 36044.8, 60 sec: 40686.9, 300 sec: 41598.7). Total num frames: 341065728. Throughput: 0: 41004.1. Samples: 223251260. Policy #0 lag: (min: 2.0, avg: 17.9, max: 42.0) [2024-03-29 14:26:33,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:26:34,252][00497] Updated weights for policy 0, policy_version 20819 (0.0020) [2024-03-29 14:26:37,372][00497] Updated weights for policy 0, policy_version 20829 (0.0029) [2024-03-29 14:26:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 341327872. Throughput: 0: 41366.3. Samples: 223476800. Policy #0 lag: (min: 2.0, avg: 17.9, max: 42.0) [2024-03-29 14:26:38,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:26:41,351][00497] Updated weights for policy 0, policy_version 20839 (0.0025) [2024-03-29 14:26:43,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 341524480. Throughput: 0: 41074.6. Samples: 223735340. Policy #0 lag: (min: 2.0, avg: 17.9, max: 42.0) [2024-03-29 14:26:43,841][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:26:45,780][00497] Updated weights for policy 0, policy_version 20849 (0.0018) [2024-03-29 14:26:48,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40686.9, 300 sec: 41654.2). Total num frames: 341688320. Throughput: 0: 40872.0. Samples: 223865120. Policy #0 lag: (min: 0.0, avg: 16.6, max: 40.0) [2024-03-29 14:26:48,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:26:49,506][00476] Signal inference workers to stop experience collection... (8000 times) [2024-03-29 14:26:49,528][00497] InferenceWorker_p0-w0: stopping experience collection (8000 times) [2024-03-29 14:26:49,727][00476] Signal inference workers to resume experience collection... (8000 times) [2024-03-29 14:26:49,727][00497] InferenceWorker_p0-w0: resuming experience collection (8000 times) [2024-03-29 14:26:50,039][00497] Updated weights for policy 0, policy_version 20859 (0.0020) [2024-03-29 14:26:53,039][00497] Updated weights for policy 0, policy_version 20869 (0.0031) [2024-03-29 14:26:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 341950464. Throughput: 0: 41906.2. Samples: 224118820. Policy #0 lag: (min: 0.0, avg: 16.6, max: 40.0) [2024-03-29 14:26:53,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:26:57,147][00497] Updated weights for policy 0, policy_version 20879 (0.0019) [2024-03-29 14:26:58,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 342147072. Throughput: 0: 41720.4. Samples: 224374180. Policy #0 lag: (min: 0.0, avg: 16.6, max: 40.0) [2024-03-29 14:26:58,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:27:01,068][00497] Updated weights for policy 0, policy_version 20889 (0.0017) [2024-03-29 14:27:03,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 342327296. Throughput: 0: 41334.2. Samples: 224502940. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 14:27:03,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:27:03,979][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020895_342343680.pth... [2024-03-29 14:27:04,437][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020287_332382208.pth [2024-03-29 14:27:05,589][00497] Updated weights for policy 0, policy_version 20899 (0.0023) [2024-03-29 14:27:08,662][00497] Updated weights for policy 0, policy_version 20909 (0.0026) [2024-03-29 14:27:08,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 342573056. Throughput: 0: 42357.0. Samples: 224754040. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 14:27:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:27:12,823][00497] Updated weights for policy 0, policy_version 20919 (0.0025) [2024-03-29 14:27:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 342753280. Throughput: 0: 41964.4. Samples: 224990500. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 14:27:13,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 14:27:16,832][00497] Updated weights for policy 0, policy_version 20929 (0.0028) [2024-03-29 14:27:18,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 342949888. Throughput: 0: 41668.4. Samples: 225126340. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:27:18,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:27:19,716][00476] Signal inference workers to stop experience collection... (8050 times) [2024-03-29 14:27:19,794][00497] InferenceWorker_p0-w0: stopping experience collection (8050 times) [2024-03-29 14:27:19,885][00476] Signal inference workers to resume experience collection... (8050 times) [2024-03-29 14:27:19,885][00497] InferenceWorker_p0-w0: resuming experience collection (8050 times) [2024-03-29 14:27:21,230][00497] Updated weights for policy 0, policy_version 20939 (0.0021) [2024-03-29 14:27:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 343195648. Throughput: 0: 42462.2. Samples: 225387600. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:27:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:27:24,347][00497] Updated weights for policy 0, policy_version 20949 (0.0020) [2024-03-29 14:27:28,455][00497] Updated weights for policy 0, policy_version 20959 (0.0026) [2024-03-29 14:27:28,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 343392256. Throughput: 0: 41739.7. Samples: 225613620. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:27:28,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:27:32,634][00497] Updated weights for policy 0, policy_version 20969 (0.0025) [2024-03-29 14:27:33,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.1, 300 sec: 41765.3). Total num frames: 343588864. Throughput: 0: 41745.6. Samples: 225743680. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:27:33,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 14:27:36,818][00497] Updated weights for policy 0, policy_version 20979 (0.0022) [2024-03-29 14:27:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 343801856. Throughput: 0: 42113.4. Samples: 226013920. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:27:38,841][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:27:39,931][00497] Updated weights for policy 0, policy_version 20989 (0.0024) [2024-03-29 14:27:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 344014848. Throughput: 0: 41291.5. Samples: 226232300. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:27:43,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:27:44,149][00497] Updated weights for policy 0, policy_version 20999 (0.0021) [2024-03-29 14:27:48,117][00476] Signal inference workers to stop experience collection... (8100 times) [2024-03-29 14:27:48,150][00497] InferenceWorker_p0-w0: stopping experience collection (8100 times) [2024-03-29 14:27:48,326][00476] Signal inference workers to resume experience collection... (8100 times) [2024-03-29 14:27:48,326][00497] InferenceWorker_p0-w0: resuming experience collection (8100 times) [2024-03-29 14:27:48,329][00497] Updated weights for policy 0, policy_version 21009 (0.0029) [2024-03-29 14:27:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 344227840. Throughput: 0: 41570.7. Samples: 226373620. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:27:48,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:27:52,633][00497] Updated weights for policy 0, policy_version 21019 (0.0021) [2024-03-29 14:27:53,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 344391680. Throughput: 0: 41698.5. Samples: 226630480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:27:53,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:27:55,919][00497] Updated weights for policy 0, policy_version 21029 (0.0023) [2024-03-29 14:27:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 344653824. Throughput: 0: 41306.3. Samples: 226849280. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:27:58,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:28:00,020][00497] Updated weights for policy 0, policy_version 21039 (0.0025) [2024-03-29 14:28:03,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 344850432. Throughput: 0: 41578.3. Samples: 226997360. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:28:03,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:28:04,032][00497] Updated weights for policy 0, policy_version 21049 (0.0019) [2024-03-29 14:28:08,190][00497] Updated weights for policy 0, policy_version 21059 (0.0023) [2024-03-29 14:28:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.0, 300 sec: 41543.2). Total num frames: 345047040. Throughput: 0: 41650.3. Samples: 227261860. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:28:08,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:28:11,633][00497] Updated weights for policy 0, policy_version 21069 (0.0022) [2024-03-29 14:28:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 345292800. Throughput: 0: 41390.6. Samples: 227476200. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:28:13,841][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:28:15,793][00497] Updated weights for policy 0, policy_version 21079 (0.0024) [2024-03-29 14:28:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 345473024. Throughput: 0: 41714.8. Samples: 227620840. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 14:28:18,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:28:19,383][00476] Signal inference workers to stop experience collection... (8150 times) [2024-03-29 14:28:19,435][00497] InferenceWorker_p0-w0: stopping experience collection (8150 times) [2024-03-29 14:28:19,469][00476] Signal inference workers to resume experience collection... (8150 times) [2024-03-29 14:28:19,471][00497] InferenceWorker_p0-w0: resuming experience collection (8150 times) [2024-03-29 14:28:19,740][00497] Updated weights for policy 0, policy_version 21089 (0.0024) [2024-03-29 14:28:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.1, 300 sec: 41543.1). Total num frames: 345669632. Throughput: 0: 41767.5. Samples: 227893460. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 14:28:23,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 14:28:24,012][00497] Updated weights for policy 0, policy_version 21099 (0.0018) [2024-03-29 14:28:27,198][00497] Updated weights for policy 0, policy_version 21109 (0.0026) [2024-03-29 14:28:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 345915392. Throughput: 0: 41476.0. Samples: 228098720. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 14:28:28,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 14:28:31,608][00497] Updated weights for policy 0, policy_version 21119 (0.0023) [2024-03-29 14:28:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 346095616. Throughput: 0: 41607.4. Samples: 228245960. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:28:33,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:28:35,417][00497] Updated weights for policy 0, policy_version 21129 (0.0023) [2024-03-29 14:28:38,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 346292224. Throughput: 0: 41903.2. Samples: 228516120. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:28:38,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:28:39,609][00497] Updated weights for policy 0, policy_version 21139 (0.0019) [2024-03-29 14:28:42,813][00497] Updated weights for policy 0, policy_version 21149 (0.0022) [2024-03-29 14:28:43,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 346554368. Throughput: 0: 41901.2. Samples: 228734840. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:28:43,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:28:47,130][00497] Updated weights for policy 0, policy_version 21159 (0.0022) [2024-03-29 14:28:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 346734592. Throughput: 0: 41514.7. Samples: 228865520. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:28:48,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:28:50,972][00497] Updated weights for policy 0, policy_version 21169 (0.0026) [2024-03-29 14:28:53,839][00126] Fps is (10 sec: 37683.4, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 346931200. Throughput: 0: 41828.4. Samples: 229144140. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:28:53,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:28:54,021][00476] Signal inference workers to stop experience collection... (8200 times) [2024-03-29 14:28:54,062][00497] InferenceWorker_p0-w0: stopping experience collection (8200 times) [2024-03-29 14:28:54,101][00476] Signal inference workers to resume experience collection... (8200 times) [2024-03-29 14:28:54,102][00497] InferenceWorker_p0-w0: resuming experience collection (8200 times) [2024-03-29 14:28:55,092][00497] Updated weights for policy 0, policy_version 21179 (0.0028) [2024-03-29 14:28:58,463][00497] Updated weights for policy 0, policy_version 21189 (0.0025) [2024-03-29 14:28:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 347176960. Throughput: 0: 42205.8. Samples: 229375460. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:28:58,840][00126] Avg episode reward: [(0, '0.280')] [2024-03-29 14:29:02,688][00497] Updated weights for policy 0, policy_version 21199 (0.0026) [2024-03-29 14:29:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 347357184. Throughput: 0: 41746.6. Samples: 229499440. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:29:03,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:29:04,172][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021203_347389952.pth... [2024-03-29 14:29:04,483][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020595_337428480.pth [2024-03-29 14:29:06,633][00497] Updated weights for policy 0, policy_version 21209 (0.0023) [2024-03-29 14:29:08,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 347553792. Throughput: 0: 41618.8. Samples: 229766300. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:29:08,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 14:29:10,756][00497] Updated weights for policy 0, policy_version 21219 (0.0024) [2024-03-29 14:29:13,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 347799552. Throughput: 0: 42649.0. Samples: 230017920. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:29:13,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:29:14,028][00497] Updated weights for policy 0, policy_version 21229 (0.0026) [2024-03-29 14:29:18,312][00497] Updated weights for policy 0, policy_version 21239 (0.0027) [2024-03-29 14:29:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 347996160. Throughput: 0: 41706.4. Samples: 230122740. Policy #0 lag: (min: 0.0, avg: 23.5, max: 42.0) [2024-03-29 14:29:18,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 14:29:22,458][00497] Updated weights for policy 0, policy_version 21249 (0.0022) [2024-03-29 14:29:23,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 348176384. Throughput: 0: 41780.4. Samples: 230396240. Policy #0 lag: (min: 0.0, avg: 23.5, max: 42.0) [2024-03-29 14:29:23,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:29:26,450][00497] Updated weights for policy 0, policy_version 21259 (0.0020) [2024-03-29 14:29:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 348422144. Throughput: 0: 42707.7. Samples: 230656680. Policy #0 lag: (min: 0.0, avg: 23.5, max: 42.0) [2024-03-29 14:29:28,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:29:29,057][00476] Signal inference workers to stop experience collection... (8250 times) [2024-03-29 14:29:29,102][00497] InferenceWorker_p0-w0: stopping experience collection (8250 times) [2024-03-29 14:29:29,137][00476] Signal inference workers to resume experience collection... (8250 times) [2024-03-29 14:29:29,140][00497] InferenceWorker_p0-w0: resuming experience collection (8250 times) [2024-03-29 14:29:29,758][00497] Updated weights for policy 0, policy_version 21269 (0.0031) [2024-03-29 14:29:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 348618752. Throughput: 0: 41871.0. Samples: 230749720. Policy #0 lag: (min: 0.0, avg: 23.1, max: 40.0) [2024-03-29 14:29:33,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 14:29:34,241][00497] Updated weights for policy 0, policy_version 21279 (0.0021) [2024-03-29 14:29:38,133][00497] Updated weights for policy 0, policy_version 21289 (0.0018) [2024-03-29 14:29:38,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 348815360. Throughput: 0: 41587.5. Samples: 231015580. Policy #0 lag: (min: 0.0, avg: 23.1, max: 40.0) [2024-03-29 14:29:38,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:29:42,164][00497] Updated weights for policy 0, policy_version 21299 (0.0021) [2024-03-29 14:29:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 349011968. Throughput: 0: 42096.8. Samples: 231269820. Policy #0 lag: (min: 0.0, avg: 23.1, max: 40.0) [2024-03-29 14:29:43,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:29:45,666][00497] Updated weights for policy 0, policy_version 21309 (0.0027) [2024-03-29 14:29:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 349257728. Throughput: 0: 41391.1. Samples: 231362040. Policy #0 lag: (min: 2.0, avg: 23.4, max: 43.0) [2024-03-29 14:29:48,842][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:29:50,075][00497] Updated weights for policy 0, policy_version 21319 (0.0019) [2024-03-29 14:29:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 349421568. Throughput: 0: 41540.8. Samples: 231635640. Policy #0 lag: (min: 2.0, avg: 23.4, max: 43.0) [2024-03-29 14:29:53,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:29:54,160][00497] Updated weights for policy 0, policy_version 21329 (0.0019) [2024-03-29 14:29:58,110][00497] Updated weights for policy 0, policy_version 21339 (0.0034) [2024-03-29 14:29:58,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41432.1). Total num frames: 349634560. Throughput: 0: 41473.7. Samples: 231884240. Policy #0 lag: (min: 2.0, avg: 23.4, max: 43.0) [2024-03-29 14:29:58,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 14:30:01,486][00476] Signal inference workers to stop experience collection... (8300 times) [2024-03-29 14:30:01,525][00497] InferenceWorker_p0-w0: stopping experience collection (8300 times) [2024-03-29 14:30:01,708][00476] Signal inference workers to resume experience collection... (8300 times) [2024-03-29 14:30:01,708][00497] InferenceWorker_p0-w0: resuming experience collection (8300 times) [2024-03-29 14:30:01,713][00497] Updated weights for policy 0, policy_version 21349 (0.0025) [2024-03-29 14:30:03,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 349863936. Throughput: 0: 41477.2. Samples: 231989220. Policy #0 lag: (min: 2.0, avg: 21.5, max: 44.0) [2024-03-29 14:30:03,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:30:06,146][00497] Updated weights for policy 0, policy_version 21359 (0.0022) [2024-03-29 14:30:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 350060544. Throughput: 0: 41268.9. Samples: 232253340. Policy #0 lag: (min: 2.0, avg: 21.5, max: 44.0) [2024-03-29 14:30:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:30:09,979][00497] Updated weights for policy 0, policy_version 21369 (0.0022) [2024-03-29 14:30:13,839][00126] Fps is (10 sec: 39322.3, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 350257152. Throughput: 0: 41052.0. Samples: 232504020. Policy #0 lag: (min: 2.0, avg: 21.5, max: 44.0) [2024-03-29 14:30:13,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 14:30:13,910][00497] Updated weights for policy 0, policy_version 21379 (0.0024) [2024-03-29 14:30:17,431][00497] Updated weights for policy 0, policy_version 21389 (0.0022) [2024-03-29 14:30:18,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.0, 300 sec: 41598.7). Total num frames: 350486528. Throughput: 0: 41565.7. Samples: 232620180. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 14:30:18,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:30:21,939][00497] Updated weights for policy 0, policy_version 21399 (0.0025) [2024-03-29 14:30:23,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 350666752. Throughput: 0: 41293.3. Samples: 232873780. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 14:30:23,842][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:30:25,908][00497] Updated weights for policy 0, policy_version 21409 (0.0032) [2024-03-29 14:30:28,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40959.9, 300 sec: 41543.1). Total num frames: 350879744. Throughput: 0: 41476.5. Samples: 233136260. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 14:30:28,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 14:30:29,660][00497] Updated weights for policy 0, policy_version 21419 (0.0030) [2024-03-29 14:30:32,834][00476] Signal inference workers to stop experience collection... (8350 times) [2024-03-29 14:30:32,868][00497] InferenceWorker_p0-w0: stopping experience collection (8350 times) [2024-03-29 14:30:33,049][00476] Signal inference workers to resume experience collection... (8350 times) [2024-03-29 14:30:33,050][00497] InferenceWorker_p0-w0: resuming experience collection (8350 times) [2024-03-29 14:30:33,053][00497] Updated weights for policy 0, policy_version 21429 (0.0029) [2024-03-29 14:30:33,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 351125504. Throughput: 0: 42154.2. Samples: 233258980. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:30:33,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:30:37,371][00497] Updated weights for policy 0, policy_version 21439 (0.0023) [2024-03-29 14:30:38,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 351305728. Throughput: 0: 41530.7. Samples: 233504520. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:30:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:30:41,560][00497] Updated weights for policy 0, policy_version 21449 (0.0018) [2024-03-29 14:30:43,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 351502336. Throughput: 0: 41779.5. Samples: 233764320. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:30:43,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:30:45,366][00497] Updated weights for policy 0, policy_version 21459 (0.0019) [2024-03-29 14:30:48,816][00497] Updated weights for policy 0, policy_version 21469 (0.0024) [2024-03-29 14:30:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 351748096. Throughput: 0: 42125.0. Samples: 233884840. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:30:48,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:30:53,044][00497] Updated weights for policy 0, policy_version 21479 (0.0020) [2024-03-29 14:30:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 351928320. Throughput: 0: 41312.8. Samples: 234112420. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:30:53,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 14:30:57,263][00497] Updated weights for policy 0, policy_version 21489 (0.0018) [2024-03-29 14:30:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 352124928. Throughput: 0: 41860.8. Samples: 234387760. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:30:58,841][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 14:31:01,052][00497] Updated weights for policy 0, policy_version 21499 (0.0024) [2024-03-29 14:31:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 352354304. Throughput: 0: 42032.5. Samples: 234511640. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:31:03,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 14:31:04,026][00476] Signal inference workers to stop experience collection... (8400 times) [2024-03-29 14:31:04,076][00497] InferenceWorker_p0-w0: stopping experience collection (8400 times) [2024-03-29 14:31:04,209][00476] Signal inference workers to resume experience collection... (8400 times) [2024-03-29 14:31:04,210][00497] InferenceWorker_p0-w0: resuming experience collection (8400 times) [2024-03-29 14:31:04,211][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021508_352387072.pth... [2024-03-29 14:31:04,520][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020895_342343680.pth [2024-03-29 14:31:04,787][00497] Updated weights for policy 0, policy_version 21509 (0.0040) [2024-03-29 14:31:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 352550912. Throughput: 0: 41303.6. Samples: 234732440. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:31:08,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 14:31:09,008][00497] Updated weights for policy 0, policy_version 21519 (0.0020) [2024-03-29 14:31:13,357][00497] Updated weights for policy 0, policy_version 21529 (0.0020) [2024-03-29 14:31:13,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 352747520. Throughput: 0: 41618.7. Samples: 235009100. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:31:13,840][00126] Avg episode reward: [(0, '0.280')] [2024-03-29 14:31:17,009][00497] Updated weights for policy 0, policy_version 21539 (0.0028) [2024-03-29 14:31:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 352976896. Throughput: 0: 41255.3. Samples: 235115460. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 14:31:18,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:31:20,317][00497] Updated weights for policy 0, policy_version 21549 (0.0025) [2024-03-29 14:31:23,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 353189888. Throughput: 0: 41035.0. Samples: 235351100. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 14:31:23,841][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:31:24,680][00497] Updated weights for policy 0, policy_version 21559 (0.0021) [2024-03-29 14:31:28,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 353353728. Throughput: 0: 41361.3. Samples: 235625580. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 14:31:28,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:31:29,337][00497] Updated weights for policy 0, policy_version 21569 (0.0029) [2024-03-29 14:31:32,793][00497] Updated weights for policy 0, policy_version 21579 (0.0026) [2024-03-29 14:31:33,839][00126] Fps is (10 sec: 39322.1, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 353583104. Throughput: 0: 41250.7. Samples: 235741120. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 14:31:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:31:36,187][00497] Updated weights for policy 0, policy_version 21589 (0.0024) [2024-03-29 14:31:36,522][00476] Signal inference workers to stop experience collection... (8450 times) [2024-03-29 14:31:36,559][00497] InferenceWorker_p0-w0: stopping experience collection (8450 times) [2024-03-29 14:31:36,738][00476] Signal inference workers to resume experience collection... (8450 times) [2024-03-29 14:31:36,738][00497] InferenceWorker_p0-w0: resuming experience collection (8450 times) [2024-03-29 14:31:38,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 353812480. Throughput: 0: 41245.3. Samples: 235968460. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 14:31:38,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:31:40,489][00497] Updated weights for policy 0, policy_version 21599 (0.0031) [2024-03-29 14:31:43,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 353992704. Throughput: 0: 41236.8. Samples: 236243420. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 14:31:43,841][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 14:31:45,131][00497] Updated weights for policy 0, policy_version 21609 (0.0019) [2024-03-29 14:31:48,716][00497] Updated weights for policy 0, policy_version 21619 (0.0030) [2024-03-29 14:31:48,839][00126] Fps is (10 sec: 39322.1, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 354205696. Throughput: 0: 41526.7. Samples: 236380340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 43.0) [2024-03-29 14:31:48,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 14:31:52,100][00497] Updated weights for policy 0, policy_version 21629 (0.0027) [2024-03-29 14:31:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 354435072. Throughput: 0: 41678.2. Samples: 236607960. Policy #0 lag: (min: 1.0, avg: 19.1, max: 43.0) [2024-03-29 14:31:53,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 14:31:56,390][00497] Updated weights for policy 0, policy_version 21639 (0.0028) [2024-03-29 14:31:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 354615296. Throughput: 0: 41347.6. Samples: 236869740. Policy #0 lag: (min: 1.0, avg: 19.1, max: 43.0) [2024-03-29 14:31:58,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:32:00,845][00497] Updated weights for policy 0, policy_version 21649 (0.0027) [2024-03-29 14:32:03,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 354844672. Throughput: 0: 42018.1. Samples: 237006280. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 14:32:03,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:32:04,317][00497] Updated weights for policy 0, policy_version 21659 (0.0028) [2024-03-29 14:32:07,572][00497] Updated weights for policy 0, policy_version 21669 (0.0022) [2024-03-29 14:32:08,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 355090432. Throughput: 0: 42189.9. Samples: 237249640. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 14:32:08,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 14:32:11,821][00476] Signal inference workers to stop experience collection... (8500 times) [2024-03-29 14:32:11,878][00497] InferenceWorker_p0-w0: stopping experience collection (8500 times) [2024-03-29 14:32:11,985][00476] Signal inference workers to resume experience collection... (8500 times) [2024-03-29 14:32:11,986][00497] InferenceWorker_p0-w0: resuming experience collection (8500 times) [2024-03-29 14:32:11,989][00497] Updated weights for policy 0, policy_version 21679 (0.0028) [2024-03-29 14:32:13,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 355237888. Throughput: 0: 41600.1. Samples: 237497580. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 14:32:13,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 14:32:16,471][00497] Updated weights for policy 0, policy_version 21689 (0.0022) [2024-03-29 14:32:18,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 355467264. Throughput: 0: 42201.8. Samples: 237640200. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 14:32:18,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 14:32:19,802][00497] Updated weights for policy 0, policy_version 21699 (0.0021) [2024-03-29 14:32:23,304][00497] Updated weights for policy 0, policy_version 21709 (0.0020) [2024-03-29 14:32:23,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 355696640. Throughput: 0: 42233.5. Samples: 237868960. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 14:32:23,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:32:27,687][00497] Updated weights for policy 0, policy_version 21719 (0.0022) [2024-03-29 14:32:28,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 355893248. Throughput: 0: 41728.9. Samples: 238121220. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 14:32:28,840][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 14:32:32,179][00497] Updated weights for policy 0, policy_version 21729 (0.0029) [2024-03-29 14:32:33,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 356089856. Throughput: 0: 41839.0. Samples: 238263100. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 14:32:33,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:32:35,744][00497] Updated weights for policy 0, policy_version 21739 (0.0021) [2024-03-29 14:32:38,839][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.3, 300 sec: 41654.3). Total num frames: 356302848. Throughput: 0: 41921.4. Samples: 238494420. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 14:32:38,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:32:39,140][00497] Updated weights for policy 0, policy_version 21749 (0.0024) [2024-03-29 14:32:43,613][00497] Updated weights for policy 0, policy_version 21759 (0.0025) [2024-03-29 14:32:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 356499456. Throughput: 0: 41279.1. Samples: 238727300. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 14:32:43,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 14:32:47,717][00476] Signal inference workers to stop experience collection... (8550 times) [2024-03-29 14:32:47,760][00497] InferenceWorker_p0-w0: stopping experience collection (8550 times) [2024-03-29 14:32:47,797][00476] Signal inference workers to resume experience collection... (8550 times) [2024-03-29 14:32:47,799][00497] InferenceWorker_p0-w0: resuming experience collection (8550 times) [2024-03-29 14:32:48,049][00497] Updated weights for policy 0, policy_version 21769 (0.0026) [2024-03-29 14:32:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 356696064. Throughput: 0: 41401.0. Samples: 238869320. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 14:32:48,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:32:51,602][00497] Updated weights for policy 0, policy_version 21779 (0.0029) [2024-03-29 14:32:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 356925440. Throughput: 0: 41444.4. Samples: 239114640. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 14:32:53,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:32:54,961][00497] Updated weights for policy 0, policy_version 21789 (0.0029) [2024-03-29 14:32:58,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 357138432. Throughput: 0: 41255.9. Samples: 239354100. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 14:32:58,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 14:32:59,257][00497] Updated weights for policy 0, policy_version 21799 (0.0027) [2024-03-29 14:33:03,759][00497] Updated weights for policy 0, policy_version 21809 (0.0026) [2024-03-29 14:33:03,840][00126] Fps is (10 sec: 39318.3, 60 sec: 41232.5, 300 sec: 41598.6). Total num frames: 357318656. Throughput: 0: 41188.5. Samples: 239493720. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:33:03,841][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:33:04,282][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021811_357351424.pth... [2024-03-29 14:33:04,616][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021203_347389952.pth [2024-03-29 14:33:07,452][00497] Updated weights for policy 0, policy_version 21819 (0.0019) [2024-03-29 14:33:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 357548032. Throughput: 0: 41400.4. Samples: 239731980. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:33:08,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:33:10,734][00497] Updated weights for policy 0, policy_version 21829 (0.0021) [2024-03-29 14:33:13,839][00126] Fps is (10 sec: 45879.1, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 357777408. Throughput: 0: 41182.3. Samples: 239974420. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:33:13,842][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 14:33:15,061][00497] Updated weights for policy 0, policy_version 21839 (0.0021) [2024-03-29 14:33:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 357941248. Throughput: 0: 41110.7. Samples: 240113080. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:33:18,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:33:19,570][00497] Updated weights for policy 0, policy_version 21849 (0.0025) [2024-03-29 14:33:21,167][00476] Signal inference workers to stop experience collection... (8600 times) [2024-03-29 14:33:21,169][00476] Signal inference workers to resume experience collection... (8600 times) [2024-03-29 14:33:21,214][00497] InferenceWorker_p0-w0: stopping experience collection (8600 times) [2024-03-29 14:33:21,214][00497] InferenceWorker_p0-w0: resuming experience collection (8600 times) [2024-03-29 14:33:22,715][00497] Updated weights for policy 0, policy_version 21859 (0.0022) [2024-03-29 14:33:23,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 358187008. Throughput: 0: 41619.5. Samples: 240367300. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:33:23,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:33:26,358][00497] Updated weights for policy 0, policy_version 21869 (0.0024) [2024-03-29 14:33:28,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 358416384. Throughput: 0: 41694.6. Samples: 240603560. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:33:28,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:33:30,598][00497] Updated weights for policy 0, policy_version 21879 (0.0030) [2024-03-29 14:33:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 358563840. Throughput: 0: 41710.7. Samples: 240746300. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:33:33,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 14:33:35,095][00497] Updated weights for policy 0, policy_version 21889 (0.0017) [2024-03-29 14:33:38,262][00497] Updated weights for policy 0, policy_version 21899 (0.0025) [2024-03-29 14:33:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 358809600. Throughput: 0: 41857.9. Samples: 240998240. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:33:38,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:33:41,998][00497] Updated weights for policy 0, policy_version 21909 (0.0031) [2024-03-29 14:33:43,839][00126] Fps is (10 sec: 47512.9, 60 sec: 42325.2, 300 sec: 41709.8). Total num frames: 359038976. Throughput: 0: 41728.0. Samples: 241231860. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:33:43,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:33:46,161][00497] Updated weights for policy 0, policy_version 21919 (0.0023) [2024-03-29 14:33:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 359219200. Throughput: 0: 41841.3. Samples: 241376540. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 14:33:48,841][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:33:50,745][00497] Updated weights for policy 0, policy_version 21929 (0.0019) [2024-03-29 14:33:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41543.1). Total num frames: 359432192. Throughput: 0: 42141.7. Samples: 241628360. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 14:33:53,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:33:54,131][00476] Signal inference workers to stop experience collection... (8650 times) [2024-03-29 14:33:54,132][00476] Signal inference workers to resume experience collection... (8650 times) [2024-03-29 14:33:54,152][00497] Updated weights for policy 0, policy_version 21939 (0.0024) [2024-03-29 14:33:54,175][00497] InferenceWorker_p0-w0: stopping experience collection (8650 times) [2024-03-29 14:33:54,176][00497] InferenceWorker_p0-w0: resuming experience collection (8650 times) [2024-03-29 14:33:57,651][00497] Updated weights for policy 0, policy_version 21949 (0.0034) [2024-03-29 14:33:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 359661568. Throughput: 0: 41950.7. Samples: 241862200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 14:33:58,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:34:01,590][00497] Updated weights for policy 0, policy_version 21959 (0.0029) [2024-03-29 14:34:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.8, 300 sec: 41654.2). Total num frames: 359841792. Throughput: 0: 41810.1. Samples: 241994540. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 14:34:03,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 14:34:06,483][00497] Updated weights for policy 0, policy_version 21969 (0.0027) [2024-03-29 14:34:08,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 360054784. Throughput: 0: 42097.4. Samples: 242261680. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 14:34:08,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:34:09,763][00497] Updated weights for policy 0, policy_version 21979 (0.0023) [2024-03-29 14:34:13,431][00497] Updated weights for policy 0, policy_version 21989 (0.0023) [2024-03-29 14:34:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 360284160. Throughput: 0: 42125.8. Samples: 242499220. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 14:34:13,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:34:17,519][00497] Updated weights for policy 0, policy_version 21999 (0.0021) [2024-03-29 14:34:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 360480768. Throughput: 0: 41287.5. Samples: 242604240. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:34:18,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:34:22,333][00497] Updated weights for policy 0, policy_version 22009 (0.0019) [2024-03-29 14:34:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 360677376. Throughput: 0: 42155.0. Samples: 242895220. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:34:23,843][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:34:25,416][00497] Updated weights for policy 0, policy_version 22019 (0.0018) [2024-03-29 14:34:26,643][00476] Signal inference workers to stop experience collection... (8700 times) [2024-03-29 14:34:26,671][00497] InferenceWorker_p0-w0: stopping experience collection (8700 times) [2024-03-29 14:34:26,835][00476] Signal inference workers to resume experience collection... (8700 times) [2024-03-29 14:34:26,836][00497] InferenceWorker_p0-w0: resuming experience collection (8700 times) [2024-03-29 14:34:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 360906752. Throughput: 0: 42211.1. Samples: 243131360. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:34:28,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:34:28,990][00497] Updated weights for policy 0, policy_version 22029 (0.0032) [2024-03-29 14:34:33,075][00497] Updated weights for policy 0, policy_version 22039 (0.0019) [2024-03-29 14:34:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 41709.8). Total num frames: 361119744. Throughput: 0: 41503.5. Samples: 243244200. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 14:34:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:34:37,945][00497] Updated weights for policy 0, policy_version 22049 (0.0023) [2024-03-29 14:34:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41654.3). Total num frames: 361299968. Throughput: 0: 42054.3. Samples: 243520800. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 14:34:38,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 14:34:41,200][00497] Updated weights for policy 0, policy_version 22059 (0.0031) [2024-03-29 14:34:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 361512960. Throughput: 0: 41918.7. Samples: 243748540. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 14:34:43,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 14:34:44,800][00497] Updated weights for policy 0, policy_version 22069 (0.0026) [2024-03-29 14:34:48,824][00497] Updated weights for policy 0, policy_version 22079 (0.0019) [2024-03-29 14:34:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 361742336. Throughput: 0: 41604.6. Samples: 243866740. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:34:48,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 14:34:53,720][00497] Updated weights for policy 0, policy_version 22089 (0.0020) [2024-03-29 14:34:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 361906176. Throughput: 0: 41450.7. Samples: 244126960. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:34:53,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:34:57,024][00497] Updated weights for policy 0, policy_version 22099 (0.0020) [2024-03-29 14:34:58,575][00476] Signal inference workers to stop experience collection... (8750 times) [2024-03-29 14:34:58,626][00497] InferenceWorker_p0-w0: stopping experience collection (8750 times) [2024-03-29 14:34:58,738][00476] Signal inference workers to resume experience collection... (8750 times) [2024-03-29 14:34:58,739][00497] InferenceWorker_p0-w0: resuming experience collection (8750 times) [2024-03-29 14:34:58,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 362135552. Throughput: 0: 41627.0. Samples: 244372440. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:34:58,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:35:00,653][00497] Updated weights for policy 0, policy_version 22109 (0.0018) [2024-03-29 14:35:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 362364928. Throughput: 0: 41884.5. Samples: 244489040. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:35:03,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:35:03,905][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022118_362381312.pth... [2024-03-29 14:35:04,238][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021508_352387072.pth [2024-03-29 14:35:04,523][00497] Updated weights for policy 0, policy_version 22119 (0.0031) [2024-03-29 14:35:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 362528768. Throughput: 0: 41259.6. Samples: 244751900. Policy #0 lag: (min: 0.0, avg: 23.4, max: 44.0) [2024-03-29 14:35:08,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 14:35:09,402][00497] Updated weights for policy 0, policy_version 22129 (0.0027) [2024-03-29 14:35:12,502][00497] Updated weights for policy 0, policy_version 22139 (0.0021) [2024-03-29 14:35:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 362774528. Throughput: 0: 41433.3. Samples: 244995860. Policy #0 lag: (min: 0.0, avg: 23.4, max: 44.0) [2024-03-29 14:35:13,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 14:35:16,295][00497] Updated weights for policy 0, policy_version 22149 (0.0039) [2024-03-29 14:35:18,839][00126] Fps is (10 sec: 49151.8, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 363020288. Throughput: 0: 41801.4. Samples: 245125260. Policy #0 lag: (min: 0.0, avg: 23.4, max: 44.0) [2024-03-29 14:35:18,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:35:20,202][00497] Updated weights for policy 0, policy_version 22159 (0.0026) [2024-03-29 14:35:23,839][00126] Fps is (10 sec: 39322.5, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 363167744. Throughput: 0: 41305.4. Samples: 245379540. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 14:35:23,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 14:35:24,809][00497] Updated weights for policy 0, policy_version 22169 (0.0018) [2024-03-29 14:35:28,004][00497] Updated weights for policy 0, policy_version 22179 (0.0022) [2024-03-29 14:35:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 363413504. Throughput: 0: 41932.8. Samples: 245635520. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 14:35:28,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 14:35:31,698][00497] Updated weights for policy 0, policy_version 22189 (0.0019) [2024-03-29 14:35:33,361][00476] Signal inference workers to stop experience collection... (8800 times) [2024-03-29 14:35:33,417][00497] InferenceWorker_p0-w0: stopping experience collection (8800 times) [2024-03-29 14:35:33,452][00476] Signal inference workers to resume experience collection... (8800 times) [2024-03-29 14:35:33,455][00497] InferenceWorker_p0-w0: resuming experience collection (8800 times) [2024-03-29 14:35:33,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 363642880. Throughput: 0: 42126.7. Samples: 245762440. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 14:35:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 14:35:35,563][00497] Updated weights for policy 0, policy_version 22199 (0.0033) [2024-03-29 14:35:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 363806720. Throughput: 0: 41918.6. Samples: 246013300. Policy #0 lag: (min: 0.0, avg: 22.7, max: 40.0) [2024-03-29 14:35:38,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:35:40,335][00497] Updated weights for policy 0, policy_version 22209 (0.0019) [2024-03-29 14:35:43,551][00497] Updated weights for policy 0, policy_version 22219 (0.0021) [2024-03-29 14:35:43,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 364036096. Throughput: 0: 42059.1. Samples: 246265100. Policy #0 lag: (min: 0.0, avg: 22.7, max: 40.0) [2024-03-29 14:35:43,840][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 14:35:47,423][00497] Updated weights for policy 0, policy_version 22229 (0.0025) [2024-03-29 14:35:48,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 364265472. Throughput: 0: 42184.0. Samples: 246387320. Policy #0 lag: (min: 0.0, avg: 22.7, max: 40.0) [2024-03-29 14:35:48,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:35:51,031][00497] Updated weights for policy 0, policy_version 22239 (0.0024) [2024-03-29 14:35:53,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 364445696. Throughput: 0: 42191.6. Samples: 246650520. Policy #0 lag: (min: 0.0, avg: 22.6, max: 40.0) [2024-03-29 14:35:53,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:35:55,880][00497] Updated weights for policy 0, policy_version 22249 (0.0019) [2024-03-29 14:35:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 364675072. Throughput: 0: 42385.9. Samples: 246903220. Policy #0 lag: (min: 0.0, avg: 22.6, max: 40.0) [2024-03-29 14:35:58,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:35:59,101][00497] Updated weights for policy 0, policy_version 22259 (0.0020) [2024-03-29 14:36:02,878][00497] Updated weights for policy 0, policy_version 22269 (0.0022) [2024-03-29 14:36:03,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 364888064. Throughput: 0: 42320.9. Samples: 247029700. Policy #0 lag: (min: 0.0, avg: 22.6, max: 40.0) [2024-03-29 14:36:03,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:36:06,703][00497] Updated weights for policy 0, policy_version 22279 (0.0019) [2024-03-29 14:36:08,841][00126] Fps is (10 sec: 40954.1, 60 sec: 42597.4, 300 sec: 41820.7). Total num frames: 365084672. Throughput: 0: 42153.3. Samples: 247276500. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:36:08,841][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:36:11,398][00497] Updated weights for policy 0, policy_version 22289 (0.0024) [2024-03-29 14:36:13,558][00476] Signal inference workers to stop experience collection... (8850 times) [2024-03-29 14:36:13,598][00497] InferenceWorker_p0-w0: stopping experience collection (8850 times) [2024-03-29 14:36:13,783][00476] Signal inference workers to resume experience collection... (8850 times) [2024-03-29 14:36:13,784][00497] InferenceWorker_p0-w0: resuming experience collection (8850 times) [2024-03-29 14:36:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 365297664. Throughput: 0: 42117.8. Samples: 247530820. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:36:13,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:36:14,847][00497] Updated weights for policy 0, policy_version 22299 (0.0025) [2024-03-29 14:36:18,676][00497] Updated weights for policy 0, policy_version 22309 (0.0020) [2024-03-29 14:36:18,839][00126] Fps is (10 sec: 42604.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 365510656. Throughput: 0: 41810.1. Samples: 247643900. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:36:18,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:36:22,472][00497] Updated weights for policy 0, policy_version 22319 (0.0017) [2024-03-29 14:36:23,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.2, 300 sec: 41931.9). Total num frames: 365723648. Throughput: 0: 41666.1. Samples: 247888280. Policy #0 lag: (min: 0.0, avg: 23.6, max: 42.0) [2024-03-29 14:36:23,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:36:27,299][00497] Updated weights for policy 0, policy_version 22329 (0.0027) [2024-03-29 14:36:28,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 365903872. Throughput: 0: 42131.7. Samples: 248161020. Policy #0 lag: (min: 0.0, avg: 23.6, max: 42.0) [2024-03-29 14:36:28,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:36:30,438][00497] Updated weights for policy 0, policy_version 22339 (0.0020) [2024-03-29 14:36:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 366133248. Throughput: 0: 41781.6. Samples: 248267500. Policy #0 lag: (min: 0.0, avg: 23.6, max: 42.0) [2024-03-29 14:36:33,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:36:34,343][00497] Updated weights for policy 0, policy_version 22349 (0.0020) [2024-03-29 14:36:38,335][00497] Updated weights for policy 0, policy_version 22359 (0.0033) [2024-03-29 14:36:38,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 366346240. Throughput: 0: 41311.0. Samples: 248509520. Policy #0 lag: (min: 0.0, avg: 23.5, max: 43.0) [2024-03-29 14:36:38,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:36:42,725][00497] Updated weights for policy 0, policy_version 22369 (0.0029) [2024-03-29 14:36:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 366526464. Throughput: 0: 42002.6. Samples: 248793340. Policy #0 lag: (min: 0.0, avg: 23.5, max: 43.0) [2024-03-29 14:36:43,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:36:45,993][00497] Updated weights for policy 0, policy_version 22379 (0.0029) [2024-03-29 14:36:46,325][00476] Signal inference workers to stop experience collection... (8900 times) [2024-03-29 14:36:46,363][00497] InferenceWorker_p0-w0: stopping experience collection (8900 times) [2024-03-29 14:36:46,545][00476] Signal inference workers to resume experience collection... (8900 times) [2024-03-29 14:36:46,546][00497] InferenceWorker_p0-w0: resuming experience collection (8900 times) [2024-03-29 14:36:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 366755840. Throughput: 0: 41689.3. Samples: 248905720. Policy #0 lag: (min: 0.0, avg: 23.5, max: 43.0) [2024-03-29 14:36:48,841][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:36:49,943][00497] Updated weights for policy 0, policy_version 22389 (0.0029) [2024-03-29 14:36:53,809][00497] Updated weights for policy 0, policy_version 22399 (0.0028) [2024-03-29 14:36:53,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 366985216. Throughput: 0: 41756.4. Samples: 249155480. Policy #0 lag: (min: 0.0, avg: 23.2, max: 42.0) [2024-03-29 14:36:53,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 14:36:57,992][00497] Updated weights for policy 0, policy_version 22409 (0.0025) [2024-03-29 14:36:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 367181824. Throughput: 0: 42223.7. Samples: 249430880. Policy #0 lag: (min: 0.0, avg: 23.2, max: 42.0) [2024-03-29 14:36:58,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:37:01,504][00497] Updated weights for policy 0, policy_version 22419 (0.0028) [2024-03-29 14:37:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 367394816. Throughput: 0: 42269.8. Samples: 249546040. Policy #0 lag: (min: 0.0, avg: 23.2, max: 42.0) [2024-03-29 14:37:03,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:37:04,021][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022425_367411200.pth... [2024-03-29 14:37:04,357][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021811_357351424.pth [2024-03-29 14:37:05,595][00497] Updated weights for policy 0, policy_version 22429 (0.0020) [2024-03-29 14:37:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42326.3, 300 sec: 41987.5). Total num frames: 367624192. Throughput: 0: 42254.8. Samples: 249789740. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:08,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:37:09,252][00497] Updated weights for policy 0, policy_version 22439 (0.0032) [2024-03-29 14:37:13,648][00497] Updated weights for policy 0, policy_version 22449 (0.0020) [2024-03-29 14:37:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 367804416. Throughput: 0: 42241.6. Samples: 250061900. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:13,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:37:16,939][00497] Updated weights for policy 0, policy_version 22459 (0.0027) [2024-03-29 14:37:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 368033792. Throughput: 0: 42429.0. Samples: 250176800. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:18,841][00126] Avg episode reward: [(0, '0.285')] [2024-03-29 14:37:20,456][00476] Signal inference workers to stop experience collection... (8950 times) [2024-03-29 14:37:20,478][00497] InferenceWorker_p0-w0: stopping experience collection (8950 times) [2024-03-29 14:37:20,678][00476] Signal inference workers to resume experience collection... (8950 times) [2024-03-29 14:37:20,679][00497] InferenceWorker_p0-w0: resuming experience collection (8950 times) [2024-03-29 14:37:20,971][00497] Updated weights for policy 0, policy_version 22469 (0.0022) [2024-03-29 14:37:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 368263168. Throughput: 0: 42719.1. Samples: 250431880. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:23,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:37:24,631][00497] Updated weights for policy 0, policy_version 22479 (0.0027) [2024-03-29 14:37:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 368427008. Throughput: 0: 42269.7. Samples: 250695480. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 14:37:28,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:37:29,240][00497] Updated weights for policy 0, policy_version 22489 (0.0026) [2024-03-29 14:37:32,529][00497] Updated weights for policy 0, policy_version 22499 (0.0026) [2024-03-29 14:37:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 368672768. Throughput: 0: 42412.5. Samples: 250814280. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 14:37:33,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:37:36,706][00497] Updated weights for policy 0, policy_version 22509 (0.0024) [2024-03-29 14:37:38,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 368885760. Throughput: 0: 42281.8. Samples: 251058160. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 14:37:38,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 14:37:40,005][00497] Updated weights for policy 0, policy_version 22519 (0.0022) [2024-03-29 14:37:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 369049600. Throughput: 0: 42036.5. Samples: 251322520. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:37:43,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:37:44,928][00497] Updated weights for policy 0, policy_version 22529 (0.0028) [2024-03-29 14:37:48,348][00497] Updated weights for policy 0, policy_version 22539 (0.0029) [2024-03-29 14:37:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 369311744. Throughput: 0: 42182.2. Samples: 251444240. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:37:48,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:37:52,333][00497] Updated weights for policy 0, policy_version 22549 (0.0027) [2024-03-29 14:37:53,816][00476] Signal inference workers to stop experience collection... (9000 times) [2024-03-29 14:37:53,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 369508352. Throughput: 0: 42396.9. Samples: 251697600. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:37:53,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 14:37:53,861][00497] InferenceWorker_p0-w0: stopping experience collection (9000 times) [2024-03-29 14:37:54,014][00476] Signal inference workers to resume experience collection... (9000 times) [2024-03-29 14:37:54,015][00497] InferenceWorker_p0-w0: resuming experience collection (9000 times) [2024-03-29 14:37:55,739][00497] Updated weights for policy 0, policy_version 22559 (0.0024) [2024-03-29 14:37:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 41932.1). Total num frames: 369688576. Throughput: 0: 41761.4. Samples: 251941160. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:37:58,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 14:38:00,447][00497] Updated weights for policy 0, policy_version 22569 (0.0029) [2024-03-29 14:38:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 369917952. Throughput: 0: 42160.8. Samples: 252074040. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:38:03,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:38:03,933][00497] Updated weights for policy 0, policy_version 22579 (0.0033) [2024-03-29 14:38:08,021][00497] Updated weights for policy 0, policy_version 22589 (0.0028) [2024-03-29 14:38:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 370130944. Throughput: 0: 42178.3. Samples: 252329900. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:38:08,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:38:11,238][00497] Updated weights for policy 0, policy_version 22599 (0.0024) [2024-03-29 14:38:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 370327552. Throughput: 0: 41653.0. Samples: 252569860. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 14:38:13,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:38:16,145][00497] Updated weights for policy 0, policy_version 22609 (0.0032) [2024-03-29 14:38:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 370556928. Throughput: 0: 41997.7. Samples: 252704180. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 14:38:18,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:38:19,673][00497] Updated weights for policy 0, policy_version 22619 (0.0024) [2024-03-29 14:38:23,791][00497] Updated weights for policy 0, policy_version 22629 (0.0025) [2024-03-29 14:38:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 370753536. Throughput: 0: 41905.7. Samples: 252943920. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 14:38:23,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:38:23,889][00476] Signal inference workers to stop experience collection... (9050 times) [2024-03-29 14:38:23,955][00497] InferenceWorker_p0-w0: stopping experience collection (9050 times) [2024-03-29 14:38:24,057][00476] Signal inference workers to resume experience collection... (9050 times) [2024-03-29 14:38:24,058][00497] InferenceWorker_p0-w0: resuming experience collection (9050 times) [2024-03-29 14:38:26,897][00497] Updated weights for policy 0, policy_version 22639 (0.0023) [2024-03-29 14:38:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 370966528. Throughput: 0: 41644.8. Samples: 253196540. Policy #0 lag: (min: 0.0, avg: 23.0, max: 44.0) [2024-03-29 14:38:28,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 14:38:31,728][00497] Updated weights for policy 0, policy_version 22649 (0.0019) [2024-03-29 14:38:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 371179520. Throughput: 0: 42010.1. Samples: 253334700. Policy #0 lag: (min: 0.0, avg: 23.0, max: 44.0) [2024-03-29 14:38:33,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:38:35,177][00497] Updated weights for policy 0, policy_version 22659 (0.0035) [2024-03-29 14:38:38,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 371359744. Throughput: 0: 41756.9. Samples: 253576660. Policy #0 lag: (min: 0.0, avg: 23.0, max: 44.0) [2024-03-29 14:38:38,841][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:38:39,389][00497] Updated weights for policy 0, policy_version 22669 (0.0021) [2024-03-29 14:38:42,707][00497] Updated weights for policy 0, policy_version 22679 (0.0021) [2024-03-29 14:38:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 371605504. Throughput: 0: 41535.9. Samples: 253810280. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 14:38:43,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:38:47,508][00497] Updated weights for policy 0, policy_version 22689 (0.0025) [2024-03-29 14:38:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 371785728. Throughput: 0: 41894.7. Samples: 253959300. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 14:38:48,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 14:38:50,941][00497] Updated weights for policy 0, policy_version 22699 (0.0020) [2024-03-29 14:38:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 371998720. Throughput: 0: 41437.6. Samples: 254194600. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 14:38:53,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:38:55,140][00497] Updated weights for policy 0, policy_version 22709 (0.0022) [2024-03-29 14:38:57,088][00476] Signal inference workers to stop experience collection... (9100 times) [2024-03-29 14:38:57,117][00497] InferenceWorker_p0-w0: stopping experience collection (9100 times) [2024-03-29 14:38:57,265][00476] Signal inference workers to resume experience collection... (9100 times) [2024-03-29 14:38:57,266][00497] InferenceWorker_p0-w0: resuming experience collection (9100 times) [2024-03-29 14:38:58,213][00497] Updated weights for policy 0, policy_version 22719 (0.0026) [2024-03-29 14:38:58,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 372244480. Throughput: 0: 41572.4. Samples: 254440620. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 14:38:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:39:02,953][00497] Updated weights for policy 0, policy_version 22729 (0.0022) [2024-03-29 14:39:03,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 372424704. Throughput: 0: 41750.2. Samples: 254582940. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 14:39:03,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:39:03,881][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022732_372441088.pth... [2024-03-29 14:39:04,197][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022118_362381312.pth [2024-03-29 14:39:06,585][00497] Updated weights for policy 0, policy_version 22739 (0.0026) [2024-03-29 14:39:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 372637696. Throughput: 0: 41855.2. Samples: 254827400. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 14:39:08,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:39:10,657][00497] Updated weights for policy 0, policy_version 22749 (0.0023) [2024-03-29 14:39:13,756][00497] Updated weights for policy 0, policy_version 22759 (0.0022) [2024-03-29 14:39:13,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 372883456. Throughput: 0: 41732.4. Samples: 255074500. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:13,840][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 14:39:18,701][00497] Updated weights for policy 0, policy_version 22769 (0.0031) [2024-03-29 14:39:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 373047296. Throughput: 0: 41728.0. Samples: 255212460. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:18,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:39:21,985][00497] Updated weights for policy 0, policy_version 22779 (0.0023) [2024-03-29 14:39:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 373276672. Throughput: 0: 41886.3. Samples: 255461540. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:23,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 14:39:26,182][00497] Updated weights for policy 0, policy_version 22789 (0.0021) [2024-03-29 14:39:28,776][00476] Signal inference workers to stop experience collection... (9150 times) [2024-03-29 14:39:28,783][00476] Signal inference workers to resume experience collection... (9150 times) [2024-03-29 14:39:28,806][00497] InferenceWorker_p0-w0: stopping experience collection (9150 times) [2024-03-29 14:39:28,826][00497] InferenceWorker_p0-w0: resuming experience collection (9150 times) [2024-03-29 14:39:28,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 373506048. Throughput: 0: 42159.2. Samples: 255707440. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:28,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 14:39:29,431][00497] Updated weights for policy 0, policy_version 22799 (0.0017) [2024-03-29 14:39:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 373686272. Throughput: 0: 41887.5. Samples: 255844240. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 14:39:33,841][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 14:39:34,352][00497] Updated weights for policy 0, policy_version 22809 (0.0018) [2024-03-29 14:39:37,497][00497] Updated weights for policy 0, policy_version 22819 (0.0026) [2024-03-29 14:39:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 373915648. Throughput: 0: 42223.2. Samples: 256094640. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 14:39:38,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 14:39:41,843][00497] Updated weights for policy 0, policy_version 22829 (0.0017) [2024-03-29 14:39:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 374128640. Throughput: 0: 42222.7. Samples: 256340640. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 14:39:43,840][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 14:39:45,119][00497] Updated weights for policy 0, policy_version 22839 (0.0029) [2024-03-29 14:39:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 374308864. Throughput: 0: 41753.4. Samples: 256461840. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 14:39:48,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:39:49,930][00497] Updated weights for policy 0, policy_version 22849 (0.0023) [2024-03-29 14:39:53,086][00497] Updated weights for policy 0, policy_version 22859 (0.0023) [2024-03-29 14:39:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.5, 300 sec: 42043.0). Total num frames: 374538240. Throughput: 0: 42159.2. Samples: 256724560. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 14:39:53,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 14:39:57,467][00497] Updated weights for policy 0, policy_version 22869 (0.0022) [2024-03-29 14:39:58,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 374767616. Throughput: 0: 42145.1. Samples: 256971020. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 14:39:58,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:40:00,639][00497] Updated weights for policy 0, policy_version 22879 (0.0029) [2024-03-29 14:40:02,626][00476] Signal inference workers to stop experience collection... (9200 times) [2024-03-29 14:40:02,659][00497] InferenceWorker_p0-w0: stopping experience collection (9200 times) [2024-03-29 14:40:02,809][00476] Signal inference workers to resume experience collection... (9200 times) [2024-03-29 14:40:02,810][00497] InferenceWorker_p0-w0: resuming experience collection (9200 times) [2024-03-29 14:40:03,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 374947840. Throughput: 0: 41763.6. Samples: 257091820. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:40:03,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:40:05,455][00497] Updated weights for policy 0, policy_version 22889 (0.0026) [2024-03-29 14:40:08,760][00497] Updated weights for policy 0, policy_version 22899 (0.0026) [2024-03-29 14:40:08,839][00126] Fps is (10 sec: 40959.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 375177216. Throughput: 0: 42144.0. Samples: 257358020. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:40:08,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:40:13,396][00497] Updated weights for policy 0, policy_version 22909 (0.0029) [2024-03-29 14:40:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 375357440. Throughput: 0: 42104.9. Samples: 257602160. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:40:13,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 14:40:16,548][00497] Updated weights for policy 0, policy_version 22919 (0.0022) [2024-03-29 14:40:18,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 375570432. Throughput: 0: 41460.5. Samples: 257709960. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 14:40:18,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 14:40:21,327][00497] Updated weights for policy 0, policy_version 22929 (0.0023) [2024-03-29 14:40:23,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 375783424. Throughput: 0: 41913.7. Samples: 257980760. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 14:40:23,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:40:24,672][00497] Updated weights for policy 0, policy_version 22939 (0.0029) [2024-03-29 14:40:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 375980032. Throughput: 0: 41937.4. Samples: 258227820. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 14:40:28,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 14:40:29,054][00497] Updated weights for policy 0, policy_version 22949 (0.0031) [2024-03-29 14:40:32,257][00497] Updated weights for policy 0, policy_version 22959 (0.0022) [2024-03-29 14:40:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 376193024. Throughput: 0: 41696.8. Samples: 258338200. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 14:40:33,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 14:40:37,065][00497] Updated weights for policy 0, policy_version 22969 (0.0024) [2024-03-29 14:40:38,588][00476] Signal inference workers to stop experience collection... (9250 times) [2024-03-29 14:40:38,617][00497] InferenceWorker_p0-w0: stopping experience collection (9250 times) [2024-03-29 14:40:38,805][00476] Signal inference workers to resume experience collection... (9250 times) [2024-03-29 14:40:38,806][00497] InferenceWorker_p0-w0: resuming experience collection (9250 times) [2024-03-29 14:40:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 41932.0). Total num frames: 376406016. Throughput: 0: 42085.8. Samples: 258618420. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 14:40:38,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:40:40,295][00497] Updated weights for policy 0, policy_version 22979 (0.0030) [2024-03-29 14:40:43,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 376619008. Throughput: 0: 41871.4. Samples: 258855240. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 14:40:43,840][00126] Avg episode reward: [(0, '0.265')] [2024-03-29 14:40:44,658][00497] Updated weights for policy 0, policy_version 22989 (0.0031) [2024-03-29 14:40:47,827][00497] Updated weights for policy 0, policy_version 22999 (0.0020) [2024-03-29 14:40:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 376832000. Throughput: 0: 41753.9. Samples: 258970740. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 14:40:48,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 14:40:52,640][00497] Updated weights for policy 0, policy_version 23009 (0.0032) [2024-03-29 14:40:53,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 377028608. Throughput: 0: 41848.9. Samples: 259241220. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 14:40:53,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:40:55,951][00497] Updated weights for policy 0, policy_version 23019 (0.0021) [2024-03-29 14:40:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 377241600. Throughput: 0: 41792.0. Samples: 259482800. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 14:40:58,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:41:00,163][00497] Updated weights for policy 0, policy_version 23029 (0.0020) [2024-03-29 14:41:03,306][00497] Updated weights for policy 0, policy_version 23039 (0.0024) [2024-03-29 14:41:03,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.3, 300 sec: 42043.2). Total num frames: 377487360. Throughput: 0: 42117.7. Samples: 259605260. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:03,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:41:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023040_377487360.pth... [2024-03-29 14:41:04,175][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022425_367411200.pth [2024-03-29 14:41:08,220][00497] Updated weights for policy 0, policy_version 23049 (0.0032) [2024-03-29 14:41:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 377651200. Throughput: 0: 42138.8. Samples: 259877000. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:08,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:41:09,952][00476] Signal inference workers to stop experience collection... (9300 times) [2024-03-29 14:41:09,959][00476] Signal inference workers to resume experience collection... (9300 times) [2024-03-29 14:41:09,978][00497] InferenceWorker_p0-w0: stopping experience collection (9300 times) [2024-03-29 14:41:10,001][00497] InferenceWorker_p0-w0: resuming experience collection (9300 times) [2024-03-29 14:41:11,455][00497] Updated weights for policy 0, policy_version 23059 (0.0025) [2024-03-29 14:41:13,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 377864192. Throughput: 0: 41819.5. Samples: 260109700. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:13,841][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:41:15,651][00497] Updated weights for policy 0, policy_version 23069 (0.0018) [2024-03-29 14:41:18,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 378109952. Throughput: 0: 42375.6. Samples: 260245100. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:18,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 14:41:18,865][00497] Updated weights for policy 0, policy_version 23079 (0.0026) [2024-03-29 14:41:23,774][00497] Updated weights for policy 0, policy_version 23089 (0.0026) [2024-03-29 14:41:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 41987.4). Total num frames: 378290176. Throughput: 0: 41967.3. Samples: 260506960. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:41:23,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:41:26,696][00497] Updated weights for policy 0, policy_version 23099 (0.0032) [2024-03-29 14:41:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 378519552. Throughput: 0: 41947.6. Samples: 260742880. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:41:28,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 14:41:31,056][00497] Updated weights for policy 0, policy_version 23109 (0.0019) [2024-03-29 14:41:33,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 378748928. Throughput: 0: 42553.8. Samples: 260885660. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:41:33,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 14:41:34,373][00497] Updated weights for policy 0, policy_version 23119 (0.0029) [2024-03-29 14:41:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 378912768. Throughput: 0: 42128.1. Samples: 261136980. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:41:38,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:41:39,296][00497] Updated weights for policy 0, policy_version 23129 (0.0022) [2024-03-29 14:41:42,135][00476] Signal inference workers to stop experience collection... (9350 times) [2024-03-29 14:41:42,200][00497] InferenceWorker_p0-w0: stopping experience collection (9350 times) [2024-03-29 14:41:42,232][00476] Signal inference workers to resume experience collection... (9350 times) [2024-03-29 14:41:42,234][00497] InferenceWorker_p0-w0: resuming experience collection (9350 times) [2024-03-29 14:41:42,493][00497] Updated weights for policy 0, policy_version 23139 (0.0020) [2024-03-29 14:41:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 379174912. Throughput: 0: 42068.4. Samples: 261375880. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:41:43,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:41:46,643][00497] Updated weights for policy 0, policy_version 23149 (0.0026) [2024-03-29 14:41:48,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 379371520. Throughput: 0: 42428.9. Samples: 261514560. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:41:48,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:41:49,914][00497] Updated weights for policy 0, policy_version 23159 (0.0020) [2024-03-29 14:41:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 379568128. Throughput: 0: 42204.5. Samples: 261776200. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 14:41:53,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 14:41:54,601][00497] Updated weights for policy 0, policy_version 23169 (0.0018) [2024-03-29 14:41:57,802][00497] Updated weights for policy 0, policy_version 23179 (0.0019) [2024-03-29 14:41:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 379813888. Throughput: 0: 42479.2. Samples: 262021260. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 14:41:58,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:42:02,133][00497] Updated weights for policy 0, policy_version 23189 (0.0024) [2024-03-29 14:42:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 380010496. Throughput: 0: 42480.5. Samples: 262156720. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 14:42:03,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:42:05,341][00497] Updated weights for policy 0, policy_version 23199 (0.0035) [2024-03-29 14:42:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 380207104. Throughput: 0: 42217.9. Samples: 262406760. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:42:08,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:42:10,271][00497] Updated weights for policy 0, policy_version 23209 (0.0023) [2024-03-29 14:42:13,310][00497] Updated weights for policy 0, policy_version 23219 (0.0017) [2024-03-29 14:42:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 380436480. Throughput: 0: 42510.2. Samples: 262655840. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:42:13,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:42:17,784][00497] Updated weights for policy 0, policy_version 23229 (0.0023) [2024-03-29 14:42:18,058][00476] Signal inference workers to stop experience collection... (9400 times) [2024-03-29 14:42:18,095][00497] InferenceWorker_p0-w0: stopping experience collection (9400 times) [2024-03-29 14:42:18,284][00476] Signal inference workers to resume experience collection... (9400 times) [2024-03-29 14:42:18,285][00497] InferenceWorker_p0-w0: resuming experience collection (9400 times) [2024-03-29 14:42:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 380633088. Throughput: 0: 42164.5. Samples: 262783060. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:42:18,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 14:42:21,285][00497] Updated weights for policy 0, policy_version 23239 (0.0023) [2024-03-29 14:42:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 380829696. Throughput: 0: 41893.3. Samples: 263022180. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:23,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 14:42:26,270][00497] Updated weights for policy 0, policy_version 23249 (0.0031) [2024-03-29 14:42:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 381042688. Throughput: 0: 42206.2. Samples: 263275160. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:28,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:42:29,103][00497] Updated weights for policy 0, policy_version 23259 (0.0030) [2024-03-29 14:42:33,671][00497] Updated weights for policy 0, policy_version 23269 (0.0023) [2024-03-29 14:42:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 381239296. Throughput: 0: 41760.5. Samples: 263393780. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:33,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:42:36,961][00497] Updated weights for policy 0, policy_version 23279 (0.0026) [2024-03-29 14:42:38,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 381468672. Throughput: 0: 41561.2. Samples: 263646460. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:38,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:42:41,979][00497] Updated weights for policy 0, policy_version 23289 (0.0021) [2024-03-29 14:42:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 381665280. Throughput: 0: 41911.6. Samples: 263907280. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:42:43,840][00126] Avg episode reward: [(0, '0.276')] [2024-03-29 14:42:44,926][00497] Updated weights for policy 0, policy_version 23299 (0.0027) [2024-03-29 14:42:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 381878272. Throughput: 0: 41251.5. Samples: 264013040. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:42:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:42:49,175][00476] Signal inference workers to stop experience collection... (9450 times) [2024-03-29 14:42:49,207][00497] InferenceWorker_p0-w0: stopping experience collection (9450 times) [2024-03-29 14:42:49,379][00476] Signal inference workers to resume experience collection... (9450 times) [2024-03-29 14:42:49,379][00497] InferenceWorker_p0-w0: resuming experience collection (9450 times) [2024-03-29 14:42:49,382][00497] Updated weights for policy 0, policy_version 23309 (0.0019) [2024-03-29 14:42:52,764][00497] Updated weights for policy 0, policy_version 23319 (0.0019) [2024-03-29 14:42:53,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 382091264. Throughput: 0: 41334.7. Samples: 264266820. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:42:53,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:42:57,545][00497] Updated weights for policy 0, policy_version 23329 (0.0023) [2024-03-29 14:42:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 382287872. Throughput: 0: 41913.3. Samples: 264541940. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 14:42:58,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:43:00,619][00497] Updated weights for policy 0, policy_version 23339 (0.0027) [2024-03-29 14:43:03,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 382484480. Throughput: 0: 41317.3. Samples: 264642340. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 14:43:03,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:43:04,009][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023346_382500864.pth... [2024-03-29 14:43:04,331][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022732_372441088.pth [2024-03-29 14:43:05,195][00497] Updated weights for policy 0, policy_version 23349 (0.0020) [2024-03-29 14:43:08,774][00497] Updated weights for policy 0, policy_version 23359 (0.0021) [2024-03-29 14:43:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 382713856. Throughput: 0: 41538.6. Samples: 264891420. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 14:43:08,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 14:43:13,432][00497] Updated weights for policy 0, policy_version 23369 (0.0021) [2024-03-29 14:43:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 40959.9, 300 sec: 41820.8). Total num frames: 382894080. Throughput: 0: 41785.2. Samples: 265155500. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:43:13,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:43:16,511][00497] Updated weights for policy 0, policy_version 23379 (0.0025) [2024-03-29 14:43:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 383107072. Throughput: 0: 41452.3. Samples: 265259140. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:43:18,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:43:21,346][00497] Updated weights for policy 0, policy_version 23389 (0.0020) [2024-03-29 14:43:21,906][00476] Signal inference workers to stop experience collection... (9500 times) [2024-03-29 14:43:21,911][00476] Signal inference workers to resume experience collection... (9500 times) [2024-03-29 14:43:21,958][00497] InferenceWorker_p0-w0: stopping experience collection (9500 times) [2024-03-29 14:43:21,958][00497] InferenceWorker_p0-w0: resuming experience collection (9500 times) [2024-03-29 14:43:23,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 383336448. Throughput: 0: 41535.7. Samples: 265515560. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:43:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:43:24,878][00497] Updated weights for policy 0, policy_version 23399 (0.0021) [2024-03-29 14:43:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 383516672. Throughput: 0: 41463.1. Samples: 265773120. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:28,841][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:43:29,084][00497] Updated weights for policy 0, policy_version 23409 (0.0016) [2024-03-29 14:43:31,996][00497] Updated weights for policy 0, policy_version 23419 (0.0038) [2024-03-29 14:43:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 383762432. Throughput: 0: 41803.1. Samples: 265894180. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:33,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:43:36,850][00497] Updated weights for policy 0, policy_version 23429 (0.0020) [2024-03-29 14:43:38,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 383959040. Throughput: 0: 42038.7. Samples: 266158560. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:38,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:43:40,238][00497] Updated weights for policy 0, policy_version 23439 (0.0018) [2024-03-29 14:43:43,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 384139264. Throughput: 0: 41261.0. Samples: 266398680. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:43,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:43:44,552][00497] Updated weights for policy 0, policy_version 23449 (0.0018) [2024-03-29 14:43:47,658][00497] Updated weights for policy 0, policy_version 23459 (0.0025) [2024-03-29 14:43:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 384385024. Throughput: 0: 41892.0. Samples: 266527480. Policy #0 lag: (min: 1.0, avg: 18.6, max: 40.0) [2024-03-29 14:43:48,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 14:43:52,422][00497] Updated weights for policy 0, policy_version 23469 (0.0023) [2024-03-29 14:43:53,731][00476] Signal inference workers to stop experience collection... (9550 times) [2024-03-29 14:43:53,806][00476] Signal inference workers to resume experience collection... (9550 times) [2024-03-29 14:43:53,809][00497] InferenceWorker_p0-w0: stopping experience collection (9550 times) [2024-03-29 14:43:53,834][00497] InferenceWorker_p0-w0: resuming experience collection (9550 times) [2024-03-29 14:43:53,839][00126] Fps is (10 sec: 45874.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 384598016. Throughput: 0: 42065.8. Samples: 266784380. Policy #0 lag: (min: 1.0, avg: 18.6, max: 40.0) [2024-03-29 14:43:53,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:43:55,727][00497] Updated weights for policy 0, policy_version 23479 (0.0030) [2024-03-29 14:43:58,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 384794624. Throughput: 0: 41714.6. Samples: 267032660. Policy #0 lag: (min: 1.0, avg: 18.6, max: 40.0) [2024-03-29 14:43:58,842][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 14:43:59,978][00497] Updated weights for policy 0, policy_version 23489 (0.0021) [2024-03-29 14:44:03,211][00497] Updated weights for policy 0, policy_version 23499 (0.0019) [2024-03-29 14:44:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 385024000. Throughput: 0: 42329.3. Samples: 267163960. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:44:03,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:44:07,957][00497] Updated weights for policy 0, policy_version 23509 (0.0023) [2024-03-29 14:44:08,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 385220608. Throughput: 0: 42200.4. Samples: 267414580. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:44:08,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:44:11,507][00497] Updated weights for policy 0, policy_version 23519 (0.0028) [2024-03-29 14:44:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 385433600. Throughput: 0: 42066.7. Samples: 267666120. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:44:13,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 14:44:15,782][00497] Updated weights for policy 0, policy_version 23529 (0.0018) [2024-03-29 14:44:18,781][00497] Updated weights for policy 0, policy_version 23539 (0.0024) [2024-03-29 14:44:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 385662976. Throughput: 0: 42397.9. Samples: 267802080. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 14:44:18,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 14:44:23,472][00497] Updated weights for policy 0, policy_version 23549 (0.0034) [2024-03-29 14:44:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 385843200. Throughput: 0: 41880.0. Samples: 268043160. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 14:44:23,840][00126] Avg episode reward: [(0, '0.272')] [2024-03-29 14:44:27,327][00497] Updated weights for policy 0, policy_version 23559 (0.0020) [2024-03-29 14:44:28,839][00126] Fps is (10 sec: 37683.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 386039808. Throughput: 0: 41825.8. Samples: 268280840. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 14:44:28,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:44:29,095][00476] Signal inference workers to stop experience collection... (9600 times) [2024-03-29 14:44:29,147][00497] InferenceWorker_p0-w0: stopping experience collection (9600 times) [2024-03-29 14:44:29,182][00476] Signal inference workers to resume experience collection... (9600 times) [2024-03-29 14:44:29,183][00497] InferenceWorker_p0-w0: resuming experience collection (9600 times) [2024-03-29 14:44:31,446][00497] Updated weights for policy 0, policy_version 23569 (0.0022) [2024-03-29 14:44:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 386269184. Throughput: 0: 42176.8. Samples: 268425440. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 14:44:33,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:44:34,515][00497] Updated weights for policy 0, policy_version 23579 (0.0025) [2024-03-29 14:44:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 386465792. Throughput: 0: 41873.0. Samples: 268668660. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 14:44:38,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:44:39,082][00497] Updated weights for policy 0, policy_version 23589 (0.0027) [2024-03-29 14:44:42,665][00497] Updated weights for policy 0, policy_version 23599 (0.0019) [2024-03-29 14:44:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 386695168. Throughput: 0: 41981.8. Samples: 268921840. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 14:44:43,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:44:46,706][00497] Updated weights for policy 0, policy_version 23609 (0.0023) [2024-03-29 14:44:48,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 386908160. Throughput: 0: 42097.8. Samples: 269058360. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:44:48,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:44:49,964][00497] Updated weights for policy 0, policy_version 23619 (0.0023) [2024-03-29 14:44:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41820.8). Total num frames: 387104768. Throughput: 0: 41981.4. Samples: 269303740. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:44:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:44:54,564][00497] Updated weights for policy 0, policy_version 23629 (0.0023) [2024-03-29 14:44:58,085][00497] Updated weights for policy 0, policy_version 23639 (0.0026) [2024-03-29 14:44:58,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 387317760. Throughput: 0: 41952.4. Samples: 269553980. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:44:58,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:45:02,417][00497] Updated weights for policy 0, policy_version 23649 (0.0032) [2024-03-29 14:45:03,840][00126] Fps is (10 sec: 42596.3, 60 sec: 41778.9, 300 sec: 41876.3). Total num frames: 387530752. Throughput: 0: 41886.6. Samples: 269687000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:45:03,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:45:04,367][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023655_387563520.pth... [2024-03-29 14:45:04,736][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023040_377487360.pth [2024-03-29 14:45:05,032][00476] Signal inference workers to stop experience collection... (9650 times) [2024-03-29 14:45:05,118][00497] InferenceWorker_p0-w0: stopping experience collection (9650 times) [2024-03-29 14:45:05,265][00476] Signal inference workers to resume experience collection... (9650 times) [2024-03-29 14:45:05,266][00497] InferenceWorker_p0-w0: resuming experience collection (9650 times) [2024-03-29 14:45:05,855][00497] Updated weights for policy 0, policy_version 23659 (0.0030) [2024-03-29 14:45:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 387727360. Throughput: 0: 41775.0. Samples: 269923040. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 14:45:08,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:45:10,191][00497] Updated weights for policy 0, policy_version 23669 (0.0019) [2024-03-29 14:45:13,802][00497] Updated weights for policy 0, policy_version 23679 (0.0023) [2024-03-29 14:45:13,839][00126] Fps is (10 sec: 42600.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 387956736. Throughput: 0: 41934.7. Samples: 270167900. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 14:45:13,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:45:18,270][00497] Updated weights for policy 0, policy_version 23689 (0.0027) [2024-03-29 14:45:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 388136960. Throughput: 0: 41816.1. Samples: 270307160. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 14:45:18,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:45:21,234][00497] Updated weights for policy 0, policy_version 23699 (0.0021) [2024-03-29 14:45:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 388366336. Throughput: 0: 41804.4. Samples: 270549860. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 14:45:23,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:45:25,806][00497] Updated weights for policy 0, policy_version 23709 (0.0019) [2024-03-29 14:45:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 388595712. Throughput: 0: 41950.3. Samples: 270809600. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 14:45:28,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 14:45:29,327][00497] Updated weights for policy 0, policy_version 23719 (0.0019) [2024-03-29 14:45:33,733][00497] Updated weights for policy 0, policy_version 23729 (0.0026) [2024-03-29 14:45:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 388775936. Throughput: 0: 41700.6. Samples: 270934880. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 14:45:33,840][00126] Avg episode reward: [(0, '0.277')] [2024-03-29 14:45:36,809][00497] Updated weights for policy 0, policy_version 23739 (0.0023) [2024-03-29 14:45:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 389005312. Throughput: 0: 41537.7. Samples: 271172940. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 14:45:38,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 14:45:41,459][00497] Updated weights for policy 0, policy_version 23749 (0.0017) [2024-03-29 14:45:41,811][00476] Signal inference workers to stop experience collection... (9700 times) [2024-03-29 14:45:41,850][00497] InferenceWorker_p0-w0: stopping experience collection (9700 times) [2024-03-29 14:45:42,011][00476] Signal inference workers to resume experience collection... (9700 times) [2024-03-29 14:45:42,012][00497] InferenceWorker_p0-w0: resuming experience collection (9700 times) [2024-03-29 14:45:43,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 389218304. Throughput: 0: 41992.9. Samples: 271443660. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 14:45:43,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:45:44,858][00497] Updated weights for policy 0, policy_version 23759 (0.0031) [2024-03-29 14:45:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 389414912. Throughput: 0: 41827.5. Samples: 271569220. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 14:45:48,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 14:45:49,019][00497] Updated weights for policy 0, policy_version 23769 (0.0022) [2024-03-29 14:45:52,275][00497] Updated weights for policy 0, policy_version 23779 (0.0030) [2024-03-29 14:45:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 389644288. Throughput: 0: 41963.6. Samples: 271811400. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:45:53,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 14:45:56,807][00497] Updated weights for policy 0, policy_version 23789 (0.0023) [2024-03-29 14:45:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 389840896. Throughput: 0: 42446.3. Samples: 272077980. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:45:58,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:46:00,334][00497] Updated weights for policy 0, policy_version 23799 (0.0017) [2024-03-29 14:46:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.6, 300 sec: 42043.0). Total num frames: 390053888. Throughput: 0: 42147.5. Samples: 272203800. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:46:03,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:46:04,550][00497] Updated weights for policy 0, policy_version 23809 (0.0025) [2024-03-29 14:46:07,780][00497] Updated weights for policy 0, policy_version 23819 (0.0029) [2024-03-29 14:46:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 390283264. Throughput: 0: 42133.3. Samples: 272445860. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:46:08,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:46:12,416][00497] Updated weights for policy 0, policy_version 23829 (0.0024) [2024-03-29 14:46:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 390479872. Throughput: 0: 42164.8. Samples: 272707020. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:46:13,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:46:15,971][00497] Updated weights for policy 0, policy_version 23839 (0.0018) [2024-03-29 14:46:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 390692864. Throughput: 0: 42147.8. Samples: 272831540. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:46:18,840][00126] Avg episode reward: [(0, '0.287')] [2024-03-29 14:46:20,217][00497] Updated weights for policy 0, policy_version 23849 (0.0023) [2024-03-29 14:46:21,454][00476] Signal inference workers to stop experience collection... (9750 times) [2024-03-29 14:46:21,569][00497] InferenceWorker_p0-w0: stopping experience collection (9750 times) [2024-03-29 14:46:21,646][00476] Signal inference workers to resume experience collection... (9750 times) [2024-03-29 14:46:21,646][00497] InferenceWorker_p0-w0: resuming experience collection (9750 times) [2024-03-29 14:46:23,362][00497] Updated weights for policy 0, policy_version 23859 (0.0023) [2024-03-29 14:46:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 390922240. Throughput: 0: 42606.7. Samples: 273090240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:46:23,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:46:27,870][00497] Updated weights for policy 0, policy_version 23869 (0.0026) [2024-03-29 14:46:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 391118848. Throughput: 0: 42336.8. Samples: 273348820. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:46:28,840][00126] Avg episode reward: [(0, '0.268')] [2024-03-29 14:46:31,147][00497] Updated weights for policy 0, policy_version 23879 (0.0028) [2024-03-29 14:46:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 391331840. Throughput: 0: 42246.8. Samples: 273470320. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:46:33,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:46:35,627][00497] Updated weights for policy 0, policy_version 23889 (0.0024) [2024-03-29 14:46:38,838][00497] Updated weights for policy 0, policy_version 23899 (0.0029) [2024-03-29 14:46:38,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 391561216. Throughput: 0: 42723.7. Samples: 273733960. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:46:38,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:46:43,399][00497] Updated weights for policy 0, policy_version 23909 (0.0022) [2024-03-29 14:46:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 391741440. Throughput: 0: 42172.0. Samples: 273975720. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:43,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:46:46,849][00497] Updated weights for policy 0, policy_version 23919 (0.0025) [2024-03-29 14:46:48,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 391954432. Throughput: 0: 41922.2. Samples: 274090300. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:48,841][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:46:51,341][00497] Updated weights for policy 0, policy_version 23929 (0.0029) [2024-03-29 14:46:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 392167424. Throughput: 0: 42522.6. Samples: 274359380. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:53,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 14:46:54,570][00497] Updated weights for policy 0, policy_version 23939 (0.0024) [2024-03-29 14:46:57,195][00476] Signal inference workers to stop experience collection... (9800 times) [2024-03-29 14:46:57,196][00476] Signal inference workers to resume experience collection... (9800 times) [2024-03-29 14:46:57,233][00497] InferenceWorker_p0-w0: stopping experience collection (9800 times) [2024-03-29 14:46:57,233][00497] InferenceWorker_p0-w0: resuming experience collection (9800 times) [2024-03-29 14:46:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 392364032. Throughput: 0: 42117.9. Samples: 274602320. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:58,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:46:59,034][00497] Updated weights for policy 0, policy_version 23949 (0.0019) [2024-03-29 14:47:02,468][00497] Updated weights for policy 0, policy_version 23959 (0.0023) [2024-03-29 14:47:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 392577024. Throughput: 0: 42171.2. Samples: 274729240. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:47:03,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:47:03,952][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023962_392593408.pth... [2024-03-29 14:47:04,291][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023346_382500864.pth [2024-03-29 14:47:06,897][00497] Updated weights for policy 0, policy_version 23969 (0.0017) [2024-03-29 14:47:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 392773632. Throughput: 0: 42101.9. Samples: 274984820. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:47:08,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:47:10,120][00497] Updated weights for policy 0, policy_version 23979 (0.0025) [2024-03-29 14:47:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 392986624. Throughput: 0: 41717.4. Samples: 275226100. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:47:13,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 14:47:14,721][00497] Updated weights for policy 0, policy_version 23989 (0.0024) [2024-03-29 14:47:18,152][00497] Updated weights for policy 0, policy_version 23999 (0.0026) [2024-03-29 14:47:18,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 393216000. Throughput: 0: 41865.7. Samples: 275354280. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:47:18,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:47:22,626][00497] Updated weights for policy 0, policy_version 24009 (0.0030) [2024-03-29 14:47:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 393396224. Throughput: 0: 41845.3. Samples: 275617000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:47:23,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:47:25,768][00497] Updated weights for policy 0, policy_version 24019 (0.0023) [2024-03-29 14:47:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 393625600. Throughput: 0: 41853.2. Samples: 275859120. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:47:28,841][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 14:47:30,272][00497] Updated weights for policy 0, policy_version 24029 (0.0026) [2024-03-29 14:47:30,781][00476] Signal inference workers to stop experience collection... (9850 times) [2024-03-29 14:47:30,858][00476] Signal inference workers to resume experience collection... (9850 times) [2024-03-29 14:47:30,859][00497] InferenceWorker_p0-w0: stopping experience collection (9850 times) [2024-03-29 14:47:30,883][00497] InferenceWorker_p0-w0: resuming experience collection (9850 times) [2024-03-29 14:47:33,666][00497] Updated weights for policy 0, policy_version 24039 (0.0026) [2024-03-29 14:47:33,839][00126] Fps is (10 sec: 45874.2, 60 sec: 42052.1, 300 sec: 41987.5). Total num frames: 393854976. Throughput: 0: 42190.5. Samples: 275988880. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 14:47:33,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 14:47:38,133][00497] Updated weights for policy 0, policy_version 24049 (0.0019) [2024-03-29 14:47:38,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 394035200. Throughput: 0: 42047.3. Samples: 276251500. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 14:47:38,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 14:47:41,369][00497] Updated weights for policy 0, policy_version 24059 (0.0032) [2024-03-29 14:47:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 394248192. Throughput: 0: 41575.0. Samples: 276473200. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 14:47:43,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:47:45,927][00497] Updated weights for policy 0, policy_version 24069 (0.0029) [2024-03-29 14:47:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 394477568. Throughput: 0: 41796.4. Samples: 276610080. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:47:48,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:47:49,369][00497] Updated weights for policy 0, policy_version 24079 (0.0019) [2024-03-29 14:47:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 394657792. Throughput: 0: 41754.6. Samples: 276863780. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:47:53,841][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:47:54,048][00497] Updated weights for policy 0, policy_version 24089 (0.0023) [2024-03-29 14:47:57,166][00497] Updated weights for policy 0, policy_version 24099 (0.0019) [2024-03-29 14:47:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 394887168. Throughput: 0: 41495.6. Samples: 277093400. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:47:58,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 14:48:01,650][00497] Updated weights for policy 0, policy_version 24109 (0.0028) [2024-03-29 14:48:03,840][00126] Fps is (10 sec: 42597.4, 60 sec: 41779.0, 300 sec: 41931.9). Total num frames: 395083776. Throughput: 0: 41893.2. Samples: 277239480. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:48:03,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:48:04,570][00476] Signal inference workers to stop experience collection... (9900 times) [2024-03-29 14:48:04,607][00497] InferenceWorker_p0-w0: stopping experience collection (9900 times) [2024-03-29 14:48:04,800][00476] Signal inference workers to resume experience collection... (9900 times) [2024-03-29 14:48:04,801][00497] InferenceWorker_p0-w0: resuming experience collection (9900 times) [2024-03-29 14:48:05,116][00497] Updated weights for policy 0, policy_version 24119 (0.0024) [2024-03-29 14:48:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 395280384. Throughput: 0: 41490.1. Samples: 277484060. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:48:08,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 14:48:09,708][00497] Updated weights for policy 0, policy_version 24129 (0.0023) [2024-03-29 14:48:12,932][00497] Updated weights for policy 0, policy_version 24139 (0.0020) [2024-03-29 14:48:13,839][00126] Fps is (10 sec: 44237.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 395526144. Throughput: 0: 41469.9. Samples: 277725260. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:48:13,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:48:17,326][00497] Updated weights for policy 0, policy_version 24149 (0.0028) [2024-03-29 14:48:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 395722752. Throughput: 0: 41673.8. Samples: 277864200. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:48:18,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:48:20,700][00497] Updated weights for policy 0, policy_version 24159 (0.0019) [2024-03-29 14:48:23,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 395919360. Throughput: 0: 41452.3. Samples: 278116860. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:48:23,842][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 14:48:25,387][00497] Updated weights for policy 0, policy_version 24169 (0.0019) [2024-03-29 14:48:28,429][00497] Updated weights for policy 0, policy_version 24179 (0.0019) [2024-03-29 14:48:28,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 396165120. Throughput: 0: 42024.1. Samples: 278364280. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:48:28,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:48:32,814][00497] Updated weights for policy 0, policy_version 24189 (0.0018) [2024-03-29 14:48:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 396345344. Throughput: 0: 41954.2. Samples: 278498020. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:48:33,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:48:36,440][00497] Updated weights for policy 0, policy_version 24199 (0.0023) [2024-03-29 14:48:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 396558336. Throughput: 0: 41558.1. Samples: 278733900. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:38,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:48:41,043][00497] Updated weights for policy 0, policy_version 24209 (0.0018) [2024-03-29 14:48:42,109][00476] Signal inference workers to stop experience collection... (9950 times) [2024-03-29 14:48:42,109][00476] Signal inference workers to resume experience collection... (9950 times) [2024-03-29 14:48:42,145][00497] InferenceWorker_p0-w0: stopping experience collection (9950 times) [2024-03-29 14:48:42,145][00497] InferenceWorker_p0-w0: resuming experience collection (9950 times) [2024-03-29 14:48:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 396771328. Throughput: 0: 42424.8. Samples: 279002520. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:48:44,179][00497] Updated weights for policy 0, policy_version 24219 (0.0024) [2024-03-29 14:48:48,415][00497] Updated weights for policy 0, policy_version 24229 (0.0022) [2024-03-29 14:48:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 396984320. Throughput: 0: 41671.3. Samples: 279114680. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:48,840][00126] Avg episode reward: [(0, '0.254')] [2024-03-29 14:48:52,153][00497] Updated weights for policy 0, policy_version 24239 (0.0027) [2024-03-29 14:48:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.1, 300 sec: 41987.5). Total num frames: 397180928. Throughput: 0: 41750.1. Samples: 279362820. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:48:56,656][00497] Updated weights for policy 0, policy_version 24249 (0.0024) [2024-03-29 14:48:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 397393920. Throughput: 0: 42388.9. Samples: 279632760. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 14:48:58,841][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:49:00,184][00497] Updated weights for policy 0, policy_version 24260 (0.0023) [2024-03-29 14:49:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 397606912. Throughput: 0: 41592.4. Samples: 279735860. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 14:49:03,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:49:04,097][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024269_397623296.pth... [2024-03-29 14:49:04,428][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023655_387563520.pth [2024-03-29 14:49:04,763][00497] Updated weights for policy 0, policy_version 24270 (0.0020) [2024-03-29 14:49:08,024][00497] Updated weights for policy 0, policy_version 24280 (0.0028) [2024-03-29 14:49:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 397819904. Throughput: 0: 41659.6. Samples: 279991540. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 14:49:08,840][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 14:49:12,834][00497] Updated weights for policy 0, policy_version 24290 (0.0022) [2024-03-29 14:49:13,604][00476] Signal inference workers to stop experience collection... (10000 times) [2024-03-29 14:49:13,685][00476] Signal inference workers to resume experience collection... (10000 times) [2024-03-29 14:49:13,687][00497] InferenceWorker_p0-w0: stopping experience collection (10000 times) [2024-03-29 14:49:13,718][00497] InferenceWorker_p0-w0: resuming experience collection (10000 times) [2024-03-29 14:49:13,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 398000128. Throughput: 0: 41880.9. Samples: 280248920. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 14:49:13,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 14:49:15,970][00497] Updated weights for policy 0, policy_version 24300 (0.0030) [2024-03-29 14:49:18,839][00126] Fps is (10 sec: 40960.9, 60 sec: 41779.4, 300 sec: 41987.5). Total num frames: 398229504. Throughput: 0: 41291.8. Samples: 280356140. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 14:49:18,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:49:20,256][00497] Updated weights for policy 0, policy_version 24310 (0.0024) [2024-03-29 14:49:23,753][00497] Updated weights for policy 0, policy_version 24320 (0.0021) [2024-03-29 14:49:23,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 398458880. Throughput: 0: 42013.8. Samples: 280624520. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 14:49:23,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:49:28,416][00497] Updated weights for policy 0, policy_version 24330 (0.0019) [2024-03-29 14:49:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 398639104. Throughput: 0: 42006.3. Samples: 280892800. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:49:31,380][00497] Updated weights for policy 0, policy_version 24340 (0.0020) [2024-03-29 14:49:33,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 398852096. Throughput: 0: 41840.4. Samples: 280997500. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:33,841][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:49:35,658][00497] Updated weights for policy 0, policy_version 24350 (0.0026) [2024-03-29 14:49:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 399081472. Throughput: 0: 42123.2. Samples: 281258360. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:38,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:49:39,299][00497] Updated weights for policy 0, policy_version 24360 (0.0026) [2024-03-29 14:49:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 399245312. Throughput: 0: 41753.8. Samples: 281511680. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:43,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 14:49:44,196][00497] Updated weights for policy 0, policy_version 24370 (0.0028) [2024-03-29 14:49:46,441][00476] Signal inference workers to stop experience collection... (10050 times) [2024-03-29 14:49:46,442][00476] Signal inference workers to resume experience collection... (10050 times) [2024-03-29 14:49:46,477][00497] InferenceWorker_p0-w0: stopping experience collection (10050 times) [2024-03-29 14:49:46,477][00497] InferenceWorker_p0-w0: resuming experience collection (10050 times) [2024-03-29 14:49:47,363][00497] Updated weights for policy 0, policy_version 24380 (0.0023) [2024-03-29 14:49:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 399491072. Throughput: 0: 42004.5. Samples: 281626060. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 14:49:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:49:51,327][00497] Updated weights for policy 0, policy_version 24390 (0.0023) [2024-03-29 14:49:53,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 399687680. Throughput: 0: 42116.8. Samples: 281886800. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 14:49:53,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:49:55,102][00497] Updated weights for policy 0, policy_version 24400 (0.0029) [2024-03-29 14:49:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 41876.5). Total num frames: 399884288. Throughput: 0: 41996.4. Samples: 282138760. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 14:49:58,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 14:49:59,593][00497] Updated weights for policy 0, policy_version 24410 (0.0019) [2024-03-29 14:50:02,986][00497] Updated weights for policy 0, policy_version 24420 (0.0032) [2024-03-29 14:50:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 400130048. Throughput: 0: 42375.4. Samples: 282263040. Policy #0 lag: (min: 1.0, avg: 19.9, max: 42.0) [2024-03-29 14:50:03,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:50:06,932][00497] Updated weights for policy 0, policy_version 24430 (0.0023) [2024-03-29 14:50:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 400326656. Throughput: 0: 41898.6. Samples: 282509960. Policy #0 lag: (min: 1.0, avg: 19.9, max: 42.0) [2024-03-29 14:50:08,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:50:10,636][00497] Updated weights for policy 0, policy_version 24440 (0.0023) [2024-03-29 14:50:13,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 400506880. Throughput: 0: 41540.4. Samples: 282762120. Policy #0 lag: (min: 1.0, avg: 19.9, max: 42.0) [2024-03-29 14:50:13,841][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:50:15,069][00497] Updated weights for policy 0, policy_version 24450 (0.0029) [2024-03-29 14:50:17,231][00476] Signal inference workers to stop experience collection... (10100 times) [2024-03-29 14:50:17,264][00497] InferenceWorker_p0-w0: stopping experience collection (10100 times) [2024-03-29 14:50:17,418][00476] Signal inference workers to resume experience collection... (10100 times) [2024-03-29 14:50:17,418][00497] InferenceWorker_p0-w0: resuming experience collection (10100 times) [2024-03-29 14:50:18,619][00497] Updated weights for policy 0, policy_version 24460 (0.0031) [2024-03-29 14:50:18,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 400752640. Throughput: 0: 42187.2. Samples: 282895920. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:18,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:50:22,845][00497] Updated weights for policy 0, policy_version 24470 (0.0029) [2024-03-29 14:50:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 400949248. Throughput: 0: 41663.6. Samples: 283133220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:23,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 14:50:26,447][00497] Updated weights for policy 0, policy_version 24480 (0.0033) [2024-03-29 14:50:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 401162240. Throughput: 0: 41744.4. Samples: 283390180. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:28,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:50:30,638][00497] Updated weights for policy 0, policy_version 24490 (0.0022) [2024-03-29 14:50:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 401391616. Throughput: 0: 42270.1. Samples: 283528220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:33,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:50:34,078][00497] Updated weights for policy 0, policy_version 24500 (0.0028) [2024-03-29 14:50:38,230][00497] Updated weights for policy 0, policy_version 24510 (0.0019) [2024-03-29 14:50:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 401571840. Throughput: 0: 41961.1. Samples: 283775040. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:38,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:50:41,947][00497] Updated weights for policy 0, policy_version 24520 (0.0025) [2024-03-29 14:50:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 401801216. Throughput: 0: 41835.0. Samples: 284021340. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:43,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:50:46,219][00497] Updated weights for policy 0, policy_version 24530 (0.0022) [2024-03-29 14:50:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 401997824. Throughput: 0: 42117.8. Samples: 284158340. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:48,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 14:50:49,720][00497] Updated weights for policy 0, policy_version 24540 (0.0035) [2024-03-29 14:50:51,909][00476] Signal inference workers to stop experience collection... (10150 times) [2024-03-29 14:50:51,978][00497] InferenceWorker_p0-w0: stopping experience collection (10150 times) [2024-03-29 14:50:51,997][00476] Signal inference workers to resume experience collection... (10150 times) [2024-03-29 14:50:51,998][00497] InferenceWorker_p0-w0: resuming experience collection (10150 times) [2024-03-29 14:50:53,718][00497] Updated weights for policy 0, policy_version 24550 (0.0031) [2024-03-29 14:50:53,842][00126] Fps is (10 sec: 42587.3, 60 sec: 42323.5, 300 sec: 41987.1). Total num frames: 402227200. Throughput: 0: 42214.9. Samples: 284409740. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:50:53,843][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 14:50:57,588][00497] Updated weights for policy 0, policy_version 24560 (0.0024) [2024-03-29 14:50:58,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 402440192. Throughput: 0: 41955.5. Samples: 284650120. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:50:58,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:51:01,716][00497] Updated weights for policy 0, policy_version 24570 (0.0027) [2024-03-29 14:51:03,839][00126] Fps is (10 sec: 40970.7, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 402636800. Throughput: 0: 42148.3. Samples: 284792600. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:51:03,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:51:04,026][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024576_402653184.pth... [2024-03-29 14:51:04,333][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023962_392593408.pth [2024-03-29 14:51:05,376][00497] Updated weights for policy 0, policy_version 24580 (0.0028) [2024-03-29 14:51:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 402849792. Throughput: 0: 42296.3. Samples: 285036560. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:08,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 14:51:09,521][00497] Updated weights for policy 0, policy_version 24590 (0.0023) [2024-03-29 14:51:13,176][00497] Updated weights for policy 0, policy_version 24600 (0.0023) [2024-03-29 14:51:13,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42871.4, 300 sec: 41987.5). Total num frames: 403079168. Throughput: 0: 42061.3. Samples: 285282940. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:13,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:51:17,237][00497] Updated weights for policy 0, policy_version 24610 (0.0019) [2024-03-29 14:51:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 403259392. Throughput: 0: 42199.2. Samples: 285427180. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:18,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:51:20,953][00476] Signal inference workers to stop experience collection... (10200 times) [2024-03-29 14:51:20,995][00497] InferenceWorker_p0-w0: stopping experience collection (10200 times) [2024-03-29 14:51:21,033][00476] Signal inference workers to resume experience collection... (10200 times) [2024-03-29 14:51:21,035][00497] InferenceWorker_p0-w0: resuming experience collection (10200 times) [2024-03-29 14:51:21,039][00497] Updated weights for policy 0, policy_version 24620 (0.0024) [2024-03-29 14:51:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 403472384. Throughput: 0: 41815.5. Samples: 285656740. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:23,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:51:25,283][00497] Updated weights for policy 0, policy_version 24630 (0.0019) [2024-03-29 14:51:28,740][00497] Updated weights for policy 0, policy_version 24640 (0.0023) [2024-03-29 14:51:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 403701760. Throughput: 0: 42011.7. Samples: 285911860. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:51:28,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:51:32,998][00497] Updated weights for policy 0, policy_version 24650 (0.0029) [2024-03-29 14:51:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 403881984. Throughput: 0: 41938.1. Samples: 286045560. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:51:33,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:51:36,462][00497] Updated weights for policy 0, policy_version 24660 (0.0033) [2024-03-29 14:51:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 404094976. Throughput: 0: 41662.9. Samples: 286284460. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:51:38,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:51:41,004][00497] Updated weights for policy 0, policy_version 24670 (0.0025) [2024-03-29 14:51:43,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 404324352. Throughput: 0: 42216.1. Samples: 286549840. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:51:43,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:51:44,242][00497] Updated weights for policy 0, policy_version 24680 (0.0024) [2024-03-29 14:51:48,762][00497] Updated weights for policy 0, policy_version 24690 (0.0019) [2024-03-29 14:51:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 404520960. Throughput: 0: 41864.6. Samples: 286676500. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:51:48,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 14:51:52,138][00497] Updated weights for policy 0, policy_version 24700 (0.0025) [2024-03-29 14:51:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41781.1, 300 sec: 41931.9). Total num frames: 404733952. Throughput: 0: 41859.7. Samples: 286920240. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:51:53,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:51:54,280][00476] Signal inference workers to stop experience collection... (10250 times) [2024-03-29 14:51:54,355][00476] Signal inference workers to resume experience collection... (10250 times) [2024-03-29 14:51:54,357][00497] InferenceWorker_p0-w0: stopping experience collection (10250 times) [2024-03-29 14:51:54,389][00497] InferenceWorker_p0-w0: resuming experience collection (10250 times) [2024-03-29 14:51:56,500][00497] Updated weights for policy 0, policy_version 24710 (0.0028) [2024-03-29 14:51:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 404946944. Throughput: 0: 42279.6. Samples: 287185520. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:51:58,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:51:59,881][00497] Updated weights for policy 0, policy_version 24720 (0.0020) [2024-03-29 14:52:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 405159936. Throughput: 0: 41608.0. Samples: 287299540. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:52:03,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:52:04,253][00497] Updated weights for policy 0, policy_version 24730 (0.0019) [2024-03-29 14:52:07,653][00497] Updated weights for policy 0, policy_version 24740 (0.0028) [2024-03-29 14:52:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 405389312. Throughput: 0: 42307.9. Samples: 287560600. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:52:08,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 14:52:12,043][00497] Updated weights for policy 0, policy_version 24750 (0.0019) [2024-03-29 14:52:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 405569536. Throughput: 0: 42223.0. Samples: 287811900. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:52:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:52:15,579][00497] Updated weights for policy 0, policy_version 24760 (0.0019) [2024-03-29 14:52:18,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 405782528. Throughput: 0: 41813.9. Samples: 287927180. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:52:18,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:52:19,993][00497] Updated weights for policy 0, policy_version 24770 (0.0020) [2024-03-29 14:52:23,426][00497] Updated weights for policy 0, policy_version 24780 (0.0021) [2024-03-29 14:52:23,719][00476] Signal inference workers to stop experience collection... (10300 times) [2024-03-29 14:52:23,719][00476] Signal inference workers to resume experience collection... (10300 times) [2024-03-29 14:52:23,759][00497] InferenceWorker_p0-w0: stopping experience collection (10300 times) [2024-03-29 14:52:23,759][00497] InferenceWorker_p0-w0: resuming experience collection (10300 times) [2024-03-29 14:52:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 406011904. Throughput: 0: 42271.6. Samples: 288186680. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:52:23,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:52:27,494][00497] Updated weights for policy 0, policy_version 24790 (0.0022) [2024-03-29 14:52:28,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 406208512. Throughput: 0: 41991.0. Samples: 288439440. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:52:28,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:52:31,047][00497] Updated weights for policy 0, policy_version 24800 (0.0030) [2024-03-29 14:52:33,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 406405120. Throughput: 0: 41839.9. Samples: 288559300. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:33,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:52:35,584][00497] Updated weights for policy 0, policy_version 24810 (0.0018) [2024-03-29 14:52:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 406634496. Throughput: 0: 42305.8. Samples: 288824000. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:38,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:52:38,892][00497] Updated weights for policy 0, policy_version 24820 (0.0021) [2024-03-29 14:52:43,030][00497] Updated weights for policy 0, policy_version 24830 (0.0024) [2024-03-29 14:52:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 406831104. Throughput: 0: 41988.8. Samples: 289075020. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:52:46,326][00497] Updated weights for policy 0, policy_version 24840 (0.0024) [2024-03-29 14:52:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 407044096. Throughput: 0: 42368.5. Samples: 289206120. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:48,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 14:52:50,788][00497] Updated weights for policy 0, policy_version 24850 (0.0026) [2024-03-29 14:52:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 407273472. Throughput: 0: 42379.5. Samples: 289467680. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:52:53,842][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 14:52:54,272][00497] Updated weights for policy 0, policy_version 24860 (0.0022) [2024-03-29 14:52:58,398][00497] Updated weights for policy 0, policy_version 24870 (0.0023) [2024-03-29 14:52:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 407486464. Throughput: 0: 42484.1. Samples: 289723680. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:52:58,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 14:53:00,369][00476] Signal inference workers to stop experience collection... (10350 times) [2024-03-29 14:53:00,394][00497] InferenceWorker_p0-w0: stopping experience collection (10350 times) [2024-03-29 14:53:00,550][00476] Signal inference workers to resume experience collection... (10350 times) [2024-03-29 14:53:00,550][00497] InferenceWorker_p0-w0: resuming experience collection (10350 times) [2024-03-29 14:53:01,869][00497] Updated weights for policy 0, policy_version 24880 (0.0018) [2024-03-29 14:53:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 407699456. Throughput: 0: 42566.6. Samples: 289842680. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:53:03,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:53:04,045][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024885_407715840.pth... [2024-03-29 14:53:04,388][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024269_397623296.pth [2024-03-29 14:53:06,133][00497] Updated weights for policy 0, policy_version 24890 (0.0023) [2024-03-29 14:53:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 407912448. Throughput: 0: 42859.9. Samples: 290115380. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:53:08,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:53:09,694][00497] Updated weights for policy 0, policy_version 24900 (0.0025) [2024-03-29 14:53:13,712][00497] Updated weights for policy 0, policy_version 24910 (0.0030) [2024-03-29 14:53:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 408125440. Throughput: 0: 42746.2. Samples: 290363020. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:53:13,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:53:17,356][00497] Updated weights for policy 0, policy_version 24920 (0.0019) [2024-03-29 14:53:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 408338432. Throughput: 0: 42770.8. Samples: 290483980. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:53:18,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:53:21,699][00497] Updated weights for policy 0, policy_version 24930 (0.0028) [2024-03-29 14:53:23,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 408551424. Throughput: 0: 42815.4. Samples: 290750700. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:23,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:53:25,181][00497] Updated weights for policy 0, policy_version 24940 (0.0022) [2024-03-29 14:53:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 408748032. Throughput: 0: 42532.2. Samples: 290988960. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:28,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:53:29,200][00497] Updated weights for policy 0, policy_version 24950 (0.0026) [2024-03-29 14:53:32,931][00497] Updated weights for policy 0, policy_version 24960 (0.0028) [2024-03-29 14:53:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42871.5, 300 sec: 42098.6). Total num frames: 408977408. Throughput: 0: 42680.0. Samples: 291126720. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:33,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:53:35,174][00476] Signal inference workers to stop experience collection... (10400 times) [2024-03-29 14:53:35,175][00476] Signal inference workers to resume experience collection... (10400 times) [2024-03-29 14:53:35,215][00497] InferenceWorker_p0-w0: stopping experience collection (10400 times) [2024-03-29 14:53:35,216][00497] InferenceWorker_p0-w0: resuming experience collection (10400 times) [2024-03-29 14:53:37,372][00497] Updated weights for policy 0, policy_version 24970 (0.0019) [2024-03-29 14:53:38,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 409190400. Throughput: 0: 42496.9. Samples: 291380040. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:38,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 14:53:40,699][00497] Updated weights for policy 0, policy_version 24980 (0.0030) [2024-03-29 14:53:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 409387008. Throughput: 0: 42132.4. Samples: 291619640. Policy #0 lag: (min: 0.0, avg: 20.9, max: 40.0) [2024-03-29 14:53:43,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 14:53:44,819][00497] Updated weights for policy 0, policy_version 24990 (0.0029) [2024-03-29 14:53:48,306][00497] Updated weights for policy 0, policy_version 25000 (0.0023) [2024-03-29 14:53:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42871.5, 300 sec: 42154.1). Total num frames: 409616384. Throughput: 0: 42570.3. Samples: 291758340. Policy #0 lag: (min: 0.0, avg: 20.9, max: 40.0) [2024-03-29 14:53:48,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 14:53:52,634][00497] Updated weights for policy 0, policy_version 25010 (0.0024) [2024-03-29 14:53:53,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 409812992. Throughput: 0: 42153.7. Samples: 292012300. Policy #0 lag: (min: 0.0, avg: 20.9, max: 40.0) [2024-03-29 14:53:53,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 14:53:56,083][00497] Updated weights for policy 0, policy_version 25020 (0.0024) [2024-03-29 14:53:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 410025984. Throughput: 0: 42140.1. Samples: 292259320. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:53:58,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:54:00,170][00497] Updated weights for policy 0, policy_version 25030 (0.0020) [2024-03-29 14:54:03,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 410238976. Throughput: 0: 42384.9. Samples: 292391300. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:54:03,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:54:03,914][00497] Updated weights for policy 0, policy_version 25040 (0.0027) [2024-03-29 14:54:06,492][00476] Signal inference workers to stop experience collection... (10450 times) [2024-03-29 14:54:06,566][00476] Signal inference workers to resume experience collection... (10450 times) [2024-03-29 14:54:06,569][00497] InferenceWorker_p0-w0: stopping experience collection (10450 times) [2024-03-29 14:54:06,593][00497] InferenceWorker_p0-w0: resuming experience collection (10450 times) [2024-03-29 14:54:08,181][00497] Updated weights for policy 0, policy_version 25050 (0.0018) [2024-03-29 14:54:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 410435584. Throughput: 0: 42157.3. Samples: 292647780. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:54:08,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 14:54:11,631][00497] Updated weights for policy 0, policy_version 25060 (0.0027) [2024-03-29 14:54:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 410664960. Throughput: 0: 42179.1. Samples: 292887020. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:54:13,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:54:15,676][00497] Updated weights for policy 0, policy_version 25070 (0.0022) [2024-03-29 14:54:18,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 410861568. Throughput: 0: 42083.6. Samples: 293020480. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:54:18,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:54:19,467][00497] Updated weights for policy 0, policy_version 25080 (0.0025) [2024-03-29 14:54:23,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 411058176. Throughput: 0: 42239.1. Samples: 293280800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:54:23,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 14:54:23,891][00497] Updated weights for policy 0, policy_version 25090 (0.0022) [2024-03-29 14:54:26,862][00497] Updated weights for policy 0, policy_version 25100 (0.0024) [2024-03-29 14:54:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 411303936. Throughput: 0: 42226.1. Samples: 293519820. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:54:28,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 14:54:31,069][00497] Updated weights for policy 0, policy_version 25110 (0.0019) [2024-03-29 14:54:33,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 411516928. Throughput: 0: 42312.8. Samples: 293662420. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:54:33,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:54:34,859][00497] Updated weights for policy 0, policy_version 25120 (0.0019) [2024-03-29 14:54:38,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 411697152. Throughput: 0: 42437.4. Samples: 293921980. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:54:38,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 14:54:39,188][00497] Updated weights for policy 0, policy_version 25130 (0.0018) [2024-03-29 14:54:39,818][00476] Signal inference workers to stop experience collection... (10500 times) [2024-03-29 14:54:39,855][00497] InferenceWorker_p0-w0: stopping experience collection (10500 times) [2024-03-29 14:54:39,983][00476] Signal inference workers to resume experience collection... (10500 times) [2024-03-29 14:54:39,984][00497] InferenceWorker_p0-w0: resuming experience collection (10500 times) [2024-03-29 14:54:42,421][00497] Updated weights for policy 0, policy_version 25140 (0.0025) [2024-03-29 14:54:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 411959296. Throughput: 0: 42228.9. Samples: 294159620. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:54:43,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 14:54:46,451][00497] Updated weights for policy 0, policy_version 25150 (0.0020) [2024-03-29 14:54:48,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 412139520. Throughput: 0: 42439.1. Samples: 294301060. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:54:48,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 14:54:50,147][00497] Updated weights for policy 0, policy_version 25160 (0.0019) [2024-03-29 14:54:53,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42325.3, 300 sec: 42265.1). Total num frames: 412352512. Throughput: 0: 42441.8. Samples: 294557660. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:54:53,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:54:54,448][00497] Updated weights for policy 0, policy_version 25170 (0.0023) [2024-03-29 14:54:58,086][00497] Updated weights for policy 0, policy_version 25181 (0.0024) [2024-03-29 14:54:58,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 412598272. Throughput: 0: 42584.9. Samples: 294803340. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:54:58,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:55:02,138][00497] Updated weights for policy 0, policy_version 25191 (0.0019) [2024-03-29 14:55:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 412778496. Throughput: 0: 42636.8. Samples: 294939140. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:55:03,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 14:55:04,266][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025196_412811264.pth... [2024-03-29 14:55:04,604][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024576_402653184.pth [2024-03-29 14:55:06,074][00497] Updated weights for policy 0, policy_version 25201 (0.0024) [2024-03-29 14:55:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 412991488. Throughput: 0: 42393.9. Samples: 295188520. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:55:08,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 14:55:10,304][00497] Updated weights for policy 0, policy_version 25211 (0.0020) [2024-03-29 14:55:13,004][00476] Signal inference workers to stop experience collection... (10550 times) [2024-03-29 14:55:13,079][00476] Signal inference workers to resume experience collection... (10550 times) [2024-03-29 14:55:13,081][00497] InferenceWorker_p0-w0: stopping experience collection (10550 times) [2024-03-29 14:55:13,104][00497] InferenceWorker_p0-w0: resuming experience collection (10550 times) [2024-03-29 14:55:13,640][00497] Updated weights for policy 0, policy_version 25221 (0.0024) [2024-03-29 14:55:13,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 413220864. Throughput: 0: 42704.5. Samples: 295441520. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:55:13,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:55:17,666][00497] Updated weights for policy 0, policy_version 25231 (0.0017) [2024-03-29 14:55:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 413417472. Throughput: 0: 42426.7. Samples: 295571620. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:55:18,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 14:55:21,536][00497] Updated weights for policy 0, policy_version 25241 (0.0018) [2024-03-29 14:55:23,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 413614080. Throughput: 0: 42091.5. Samples: 295816100. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:23,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:55:25,789][00497] Updated weights for policy 0, policy_version 25251 (0.0018) [2024-03-29 14:55:28,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 413859840. Throughput: 0: 42636.8. Samples: 296078280. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:28,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 14:55:29,018][00497] Updated weights for policy 0, policy_version 25261 (0.0017) [2024-03-29 14:55:33,205][00497] Updated weights for policy 0, policy_version 25271 (0.0026) [2024-03-29 14:55:33,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 414056448. Throughput: 0: 42324.0. Samples: 296205640. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:33,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 14:55:36,762][00497] Updated weights for policy 0, policy_version 25281 (0.0024) [2024-03-29 14:55:38,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 414269440. Throughput: 0: 42238.4. Samples: 296458380. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:38,841][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:55:41,201][00497] Updated weights for policy 0, policy_version 25291 (0.0026) [2024-03-29 14:55:43,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 414482432. Throughput: 0: 42401.6. Samples: 296711420. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:55:43,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 14:55:44,544][00476] Signal inference workers to stop experience collection... (10600 times) [2024-03-29 14:55:44,584][00497] InferenceWorker_p0-w0: stopping experience collection (10600 times) [2024-03-29 14:55:44,624][00476] Signal inference workers to resume experience collection... (10600 times) [2024-03-29 14:55:44,626][00497] InferenceWorker_p0-w0: resuming experience collection (10600 times) [2024-03-29 14:55:44,630][00497] Updated weights for policy 0, policy_version 25301 (0.0026) [2024-03-29 14:55:48,716][00497] Updated weights for policy 0, policy_version 25311 (0.0029) [2024-03-29 14:55:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.6). Total num frames: 414695424. Throughput: 0: 42049.4. Samples: 296831360. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:55:48,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:55:52,338][00497] Updated weights for policy 0, policy_version 25321 (0.0032) [2024-03-29 14:55:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 414908416. Throughput: 0: 42385.6. Samples: 297095880. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:55:53,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 14:55:56,526][00497] Updated weights for policy 0, policy_version 25331 (0.0023) [2024-03-29 14:55:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 415121408. Throughput: 0: 42459.1. Samples: 297352180. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 14:55:58,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 14:55:59,880][00497] Updated weights for policy 0, policy_version 25341 (0.0019) [2024-03-29 14:56:03,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 415334400. Throughput: 0: 42328.9. Samples: 297476420. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 14:56:03,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:56:03,942][00497] Updated weights for policy 0, policy_version 25351 (0.0022) [2024-03-29 14:56:07,778][00497] Updated weights for policy 0, policy_version 25361 (0.0021) [2024-03-29 14:56:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 415547392. Throughput: 0: 42753.4. Samples: 297740000. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 14:56:08,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 14:56:11,933][00497] Updated weights for policy 0, policy_version 25371 (0.0023) [2024-03-29 14:56:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 415760384. Throughput: 0: 42386.8. Samples: 297985680. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:13,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:56:15,321][00497] Updated weights for policy 0, policy_version 25381 (0.0027) [2024-03-29 14:56:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 415956992. Throughput: 0: 42332.5. Samples: 298110600. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:18,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:56:19,642][00497] Updated weights for policy 0, policy_version 25391 (0.0026) [2024-03-29 14:56:19,926][00476] Signal inference workers to stop experience collection... (10650 times) [2024-03-29 14:56:19,967][00497] InferenceWorker_p0-w0: stopping experience collection (10650 times) [2024-03-29 14:56:20,004][00476] Signal inference workers to resume experience collection... (10650 times) [2024-03-29 14:56:20,007][00497] InferenceWorker_p0-w0: resuming experience collection (10650 times) [2024-03-29 14:56:23,249][00497] Updated weights for policy 0, policy_version 25401 (0.0024) [2024-03-29 14:56:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42871.6, 300 sec: 42320.7). Total num frames: 416186368. Throughput: 0: 42612.0. Samples: 298375920. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:23,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 14:56:27,353][00497] Updated weights for policy 0, policy_version 25411 (0.0023) [2024-03-29 14:56:28,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 416399360. Throughput: 0: 42382.2. Samples: 298618620. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:28,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:56:30,752][00497] Updated weights for policy 0, policy_version 25421 (0.0026) [2024-03-29 14:56:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 416595968. Throughput: 0: 42428.5. Samples: 298740640. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 14:56:33,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 14:56:34,934][00497] Updated weights for policy 0, policy_version 25431 (0.0027) [2024-03-29 14:56:38,714][00497] Updated weights for policy 0, policy_version 25441 (0.0025) [2024-03-29 14:56:38,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 416825344. Throughput: 0: 42502.9. Samples: 299008500. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 14:56:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 14:56:42,955][00497] Updated weights for policy 0, policy_version 25451 (0.0024) [2024-03-29 14:56:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 417021952. Throughput: 0: 42152.0. Samples: 299249020. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 14:56:43,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 14:56:46,571][00497] Updated weights for policy 0, policy_version 25461 (0.0024) [2024-03-29 14:56:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 417218560. Throughput: 0: 42197.8. Samples: 299375320. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:56:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:56:50,529][00497] Updated weights for policy 0, policy_version 25471 (0.0018) [2024-03-29 14:56:53,412][00476] Signal inference workers to stop experience collection... (10700 times) [2024-03-29 14:56:53,414][00476] Signal inference workers to resume experience collection... (10700 times) [2024-03-29 14:56:53,457][00497] InferenceWorker_p0-w0: stopping experience collection (10700 times) [2024-03-29 14:56:53,457][00497] InferenceWorker_p0-w0: resuming experience collection (10700 times) [2024-03-29 14:56:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 417447936. Throughput: 0: 42176.5. Samples: 299637940. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:56:53,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:56:54,302][00497] Updated weights for policy 0, policy_version 25481 (0.0027) [2024-03-29 14:56:58,347][00497] Updated weights for policy 0, policy_version 25491 (0.0022) [2024-03-29 14:56:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 417660928. Throughput: 0: 42381.7. Samples: 299892860. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:56:58,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:57:01,847][00497] Updated weights for policy 0, policy_version 25501 (0.0020) [2024-03-29 14:57:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 417873920. Throughput: 0: 42405.3. Samples: 300018840. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:57:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:57:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025506_417890304.pth... [2024-03-29 14:57:04,262][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024885_407715840.pth [2024-03-29 14:57:05,824][00497] Updated weights for policy 0, policy_version 25511 (0.0023) [2024-03-29 14:57:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 418086912. Throughput: 0: 42327.1. Samples: 300280640. Policy #0 lag: (min: 2.0, avg: 21.0, max: 43.0) [2024-03-29 14:57:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:57:09,665][00497] Updated weights for policy 0, policy_version 25521 (0.0020) [2024-03-29 14:57:13,756][00497] Updated weights for policy 0, policy_version 25531 (0.0026) [2024-03-29 14:57:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 418299904. Throughput: 0: 42700.0. Samples: 300540120. Policy #0 lag: (min: 2.0, avg: 21.0, max: 43.0) [2024-03-29 14:57:13,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:57:17,402][00497] Updated weights for policy 0, policy_version 25541 (0.0025) [2024-03-29 14:57:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 418512896. Throughput: 0: 42363.9. Samples: 300647020. Policy #0 lag: (min: 2.0, avg: 21.0, max: 43.0) [2024-03-29 14:57:18,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:57:21,530][00497] Updated weights for policy 0, policy_version 25551 (0.0019) [2024-03-29 14:57:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 418709504. Throughput: 0: 42054.9. Samples: 300900980. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:23,841][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:57:24,951][00476] Signal inference workers to stop experience collection... (10750 times) [2024-03-29 14:57:24,989][00497] InferenceWorker_p0-w0: stopping experience collection (10750 times) [2024-03-29 14:57:25,178][00476] Signal inference workers to resume experience collection... (10750 times) [2024-03-29 14:57:25,178][00497] InferenceWorker_p0-w0: resuming experience collection (10750 times) [2024-03-29 14:57:25,454][00497] Updated weights for policy 0, policy_version 25561 (0.0021) [2024-03-29 14:57:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 418889728. Throughput: 0: 42471.1. Samples: 301160220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:28,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:57:29,628][00497] Updated weights for policy 0, policy_version 25571 (0.0029) [2024-03-29 14:57:33,110][00497] Updated weights for policy 0, policy_version 25581 (0.0020) [2024-03-29 14:57:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 419135488. Throughput: 0: 42157.8. Samples: 301272420. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:33,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:57:37,214][00497] Updated weights for policy 0, policy_version 25591 (0.0022) [2024-03-29 14:57:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 42376.2). Total num frames: 419332096. Throughput: 0: 42007.5. Samples: 301528280. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:38,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:57:41,156][00497] Updated weights for policy 0, policy_version 25601 (0.0025) [2024-03-29 14:57:43,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42320.7). Total num frames: 419528704. Throughput: 0: 41946.6. Samples: 301780460. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:57:43,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:57:45,168][00497] Updated weights for policy 0, policy_version 25611 (0.0032) [2024-03-29 14:57:48,795][00497] Updated weights for policy 0, policy_version 25621 (0.0024) [2024-03-29 14:57:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 419774464. Throughput: 0: 41828.0. Samples: 301901100. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:57:48,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:57:52,850][00497] Updated weights for policy 0, policy_version 25631 (0.0022) [2024-03-29 14:57:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 419971072. Throughput: 0: 41741.2. Samples: 302159000. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:57:53,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 14:57:56,912][00497] Updated weights for policy 0, policy_version 25641 (0.0020) [2024-03-29 14:57:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 420167680. Throughput: 0: 41346.3. Samples: 302400700. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:57:58,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:58:00,463][00476] Signal inference workers to stop experience collection... (10800 times) [2024-03-29 14:58:00,505][00497] InferenceWorker_p0-w0: stopping experience collection (10800 times) [2024-03-29 14:58:00,628][00476] Signal inference workers to resume experience collection... (10800 times) [2024-03-29 14:58:00,629][00497] InferenceWorker_p0-w0: resuming experience collection (10800 times) [2024-03-29 14:58:00,885][00497] Updated weights for policy 0, policy_version 25651 (0.0025) [2024-03-29 14:58:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 420380672. Throughput: 0: 41566.3. Samples: 302517500. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:58:03,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:58:04,640][00497] Updated weights for policy 0, policy_version 25661 (0.0023) [2024-03-29 14:58:08,599][00497] Updated weights for policy 0, policy_version 25671 (0.0020) [2024-03-29 14:58:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 420593664. Throughput: 0: 41840.1. Samples: 302783780. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:58:08,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:58:12,650][00497] Updated weights for policy 0, policy_version 25681 (0.0024) [2024-03-29 14:58:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 420806656. Throughput: 0: 41504.5. Samples: 303027920. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:58:13,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:58:16,563][00497] Updated weights for policy 0, policy_version 25691 (0.0032) [2024-03-29 14:58:18,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 420986880. Throughput: 0: 41845.7. Samples: 303155480. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 14:58:18,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:58:20,524][00497] Updated weights for policy 0, policy_version 25701 (0.0029) [2024-03-29 14:58:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 421199872. Throughput: 0: 41296.5. Samples: 303386620. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 14:58:23,841][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:58:24,610][00497] Updated weights for policy 0, policy_version 25711 (0.0017) [2024-03-29 14:58:28,656][00497] Updated weights for policy 0, policy_version 25721 (0.0028) [2024-03-29 14:58:28,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.4, 300 sec: 42154.1). Total num frames: 421412864. Throughput: 0: 41716.2. Samples: 303657680. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 14:58:28,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:58:32,160][00476] Signal inference workers to stop experience collection... (10850 times) [2024-03-29 14:58:32,203][00497] InferenceWorker_p0-w0: stopping experience collection (10850 times) [2024-03-29 14:58:32,368][00476] Signal inference workers to resume experience collection... (10850 times) [2024-03-29 14:58:32,368][00497] InferenceWorker_p0-w0: resuming experience collection (10850 times) [2024-03-29 14:58:32,641][00497] Updated weights for policy 0, policy_version 25731 (0.0024) [2024-03-29 14:58:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 421625856. Throughput: 0: 41875.2. Samples: 303785480. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:33,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 14:58:36,295][00497] Updated weights for policy 0, policy_version 25741 (0.0021) [2024-03-29 14:58:38,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 421838848. Throughput: 0: 41117.4. Samples: 304009280. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:38,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:58:40,601][00497] Updated weights for policy 0, policy_version 25751 (0.0032) [2024-03-29 14:58:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 422019072. Throughput: 0: 41596.0. Samples: 304272520. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:43,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:58:44,399][00497] Updated weights for policy 0, policy_version 25761 (0.0019) [2024-03-29 14:58:48,173][00497] Updated weights for policy 0, policy_version 25771 (0.0032) [2024-03-29 14:58:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 422248448. Throughput: 0: 41859.6. Samples: 304401180. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:48,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:58:51,867][00497] Updated weights for policy 0, policy_version 25781 (0.0020) [2024-03-29 14:58:53,839][00126] Fps is (10 sec: 45874.5, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 422477824. Throughput: 0: 41129.2. Samples: 304634600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 14:58:53,842][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:58:56,235][00497] Updated weights for policy 0, policy_version 25791 (0.0023) [2024-03-29 14:58:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 422658048. Throughput: 0: 41631.6. Samples: 304901340. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 14:58:58,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:58:59,940][00497] Updated weights for policy 0, policy_version 25801 (0.0021) [2024-03-29 14:59:03,767][00497] Updated weights for policy 0, policy_version 25811 (0.0017) [2024-03-29 14:59:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 422887424. Throughput: 0: 41611.5. Samples: 305028000. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 14:59:03,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:59:04,057][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025812_422903808.pth... [2024-03-29 14:59:04,376][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025196_412811264.pth [2024-03-29 14:59:04,852][00476] Signal inference workers to stop experience collection... (10900 times) [2024-03-29 14:59:04,875][00497] InferenceWorker_p0-w0: stopping experience collection (10900 times) [2024-03-29 14:59:05,074][00476] Signal inference workers to resume experience collection... (10900 times) [2024-03-29 14:59:05,074][00497] InferenceWorker_p0-w0: resuming experience collection (10900 times) [2024-03-29 14:59:07,577][00497] Updated weights for policy 0, policy_version 25821 (0.0027) [2024-03-29 14:59:08,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 423116800. Throughput: 0: 41841.8. Samples: 305269500. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:08,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:59:11,804][00497] Updated weights for policy 0, policy_version 25831 (0.0023) [2024-03-29 14:59:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 423313408. Throughput: 0: 41546.5. Samples: 305527280. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:13,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:59:15,697][00497] Updated weights for policy 0, policy_version 25841 (0.0022) [2024-03-29 14:59:18,839][00126] Fps is (10 sec: 37683.0, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 423493632. Throughput: 0: 41516.4. Samples: 305653720. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:18,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 14:59:19,426][00497] Updated weights for policy 0, policy_version 25851 (0.0026) [2024-03-29 14:59:23,155][00497] Updated weights for policy 0, policy_version 25861 (0.0032) [2024-03-29 14:59:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 423739392. Throughput: 0: 42028.5. Samples: 305900560. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:23,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 14:59:27,183][00497] Updated weights for policy 0, policy_version 25871 (0.0018) [2024-03-29 14:59:28,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 423936000. Throughput: 0: 41890.6. Samples: 306157600. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:59:28,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:59:31,300][00497] Updated weights for policy 0, policy_version 25881 (0.0024) [2024-03-29 14:59:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 424116224. Throughput: 0: 41930.2. Samples: 306288040. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:59:33,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:59:34,990][00497] Updated weights for policy 0, policy_version 25891 (0.0029) [2024-03-29 14:59:38,827][00497] Updated weights for policy 0, policy_version 25901 (0.0026) [2024-03-29 14:59:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 424361984. Throughput: 0: 42263.3. Samples: 306536440. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:59:38,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:59:42,139][00476] Signal inference workers to stop experience collection... (10950 times) [2024-03-29 14:59:42,179][00497] InferenceWorker_p0-w0: stopping experience collection (10950 times) [2024-03-29 14:59:42,225][00476] Signal inference workers to resume experience collection... (10950 times) [2024-03-29 14:59:42,226][00497] InferenceWorker_p0-w0: resuming experience collection (10950 times) [2024-03-29 14:59:42,876][00497] Updated weights for policy 0, policy_version 25911 (0.0035) [2024-03-29 14:59:43,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 424558592. Throughput: 0: 41845.6. Samples: 306784400. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:43,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:59:46,827][00497] Updated weights for policy 0, policy_version 25921 (0.0028) [2024-03-29 14:59:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 424755200. Throughput: 0: 42111.2. Samples: 306923000. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:48,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 14:59:50,658][00497] Updated weights for policy 0, policy_version 25931 (0.0027) [2024-03-29 14:59:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 424984576. Throughput: 0: 42152.4. Samples: 307166360. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:53,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:59:54,581][00497] Updated weights for policy 0, policy_version 25941 (0.0025) [2024-03-29 14:59:58,620][00497] Updated weights for policy 0, policy_version 25951 (0.0020) [2024-03-29 14:59:58,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42052.1, 300 sec: 42043.0). Total num frames: 425181184. Throughput: 0: 41863.5. Samples: 307411140. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:58,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 15:00:02,390][00497] Updated weights for policy 0, policy_version 25961 (0.0032) [2024-03-29 15:00:03,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 425410560. Throughput: 0: 42112.5. Samples: 307548780. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 15:00:03,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:00:06,068][00497] Updated weights for policy 0, policy_version 25971 (0.0020) [2024-03-29 15:00:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.0, 300 sec: 41987.4). Total num frames: 425607168. Throughput: 0: 42162.9. Samples: 307797900. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 15:00:08,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 15:00:10,068][00497] Updated weights for policy 0, policy_version 25981 (0.0020) [2024-03-29 15:00:11,079][00476] Signal inference workers to stop experience collection... (11000 times) [2024-03-29 15:00:11,152][00497] InferenceWorker_p0-w0: stopping experience collection (11000 times) [2024-03-29 15:00:11,157][00476] Signal inference workers to resume experience collection... (11000 times) [2024-03-29 15:00:11,180][00497] InferenceWorker_p0-w0: resuming experience collection (11000 times) [2024-03-29 15:00:13,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 425803776. Throughput: 0: 41916.4. Samples: 308043840. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 15:00:13,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 15:00:14,181][00497] Updated weights for policy 0, policy_version 25991 (0.0024) [2024-03-29 15:00:18,390][00497] Updated weights for policy 0, policy_version 26001 (0.0019) [2024-03-29 15:00:18,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 426016768. Throughput: 0: 41746.6. Samples: 308166640. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:18,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:00:22,041][00497] Updated weights for policy 0, policy_version 26011 (0.0029) [2024-03-29 15:00:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 426213376. Throughput: 0: 41559.9. Samples: 308406640. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:23,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 15:00:26,125][00497] Updated weights for policy 0, policy_version 26021 (0.0027) [2024-03-29 15:00:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 426426368. Throughput: 0: 41581.4. Samples: 308655560. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:28,841][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 15:00:29,950][00497] Updated weights for policy 0, policy_version 26031 (0.0028) [2024-03-29 15:00:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 426639360. Throughput: 0: 41475.0. Samples: 308789380. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:33,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:00:34,060][00497] Updated weights for policy 0, policy_version 26041 (0.0019) [2024-03-29 15:00:37,702][00497] Updated weights for policy 0, policy_version 26051 (0.0026) [2024-03-29 15:00:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 426835968. Throughput: 0: 41842.2. Samples: 309049260. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 15:00:38,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 15:00:41,511][00497] Updated weights for policy 0, policy_version 26061 (0.0023) [2024-03-29 15:00:41,512][00476] Signal inference workers to stop experience collection... (11050 times) [2024-03-29 15:00:41,513][00476] Signal inference workers to resume experience collection... (11050 times) [2024-03-29 15:00:41,558][00497] InferenceWorker_p0-w0: stopping experience collection (11050 times) [2024-03-29 15:00:41,558][00497] InferenceWorker_p0-w0: resuming experience collection (11050 times) [2024-03-29 15:00:43,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 427081728. Throughput: 0: 41786.4. Samples: 309291520. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 15:00:43,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 15:00:45,432][00497] Updated weights for policy 0, policy_version 26071 (0.0022) [2024-03-29 15:00:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 427278336. Throughput: 0: 41671.6. Samples: 309424000. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 15:00:48,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:00:49,338][00497] Updated weights for policy 0, policy_version 26081 (0.0019) [2024-03-29 15:00:53,018][00497] Updated weights for policy 0, policy_version 26091 (0.0025) [2024-03-29 15:00:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 427491328. Throughput: 0: 42119.2. Samples: 309693260. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:00:53,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:00:57,166][00497] Updated weights for policy 0, policy_version 26101 (0.0022) [2024-03-29 15:00:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 427704320. Throughput: 0: 41663.6. Samples: 309918700. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:00:58,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 15:01:01,279][00497] Updated weights for policy 0, policy_version 26111 (0.0020) [2024-03-29 15:01:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 427900928. Throughput: 0: 41928.8. Samples: 310053440. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:01:03,842][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 15:01:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026117_427900928.pth... [2024-03-29 15:01:04,170][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025506_417890304.pth [2024-03-29 15:01:05,499][00497] Updated weights for policy 0, policy_version 26121 (0.0023) [2024-03-29 15:01:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 428113920. Throughput: 0: 42181.9. Samples: 310304820. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:01:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:01:09,024][00497] Updated weights for policy 0, policy_version 26131 (0.0026) [2024-03-29 15:01:12,994][00497] Updated weights for policy 0, policy_version 26141 (0.0018) [2024-03-29 15:01:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 428326912. Throughput: 0: 41847.4. Samples: 310538700. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 15:01:13,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 15:01:16,683][00476] Signal inference workers to stop experience collection... (11100 times) [2024-03-29 15:01:16,684][00476] Signal inference workers to resume experience collection... (11100 times) [2024-03-29 15:01:16,721][00497] InferenceWorker_p0-w0: stopping experience collection (11100 times) [2024-03-29 15:01:16,721][00497] InferenceWorker_p0-w0: resuming experience collection (11100 times) [2024-03-29 15:01:16,995][00497] Updated weights for policy 0, policy_version 26151 (0.0023) [2024-03-29 15:01:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 428539904. Throughput: 0: 41879.1. Samples: 310673940. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 15:01:18,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:01:21,166][00497] Updated weights for policy 0, policy_version 26161 (0.0020) [2024-03-29 15:01:23,839][00126] Fps is (10 sec: 40961.1, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 428736512. Throughput: 0: 41730.0. Samples: 310927100. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 15:01:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 15:01:24,632][00497] Updated weights for policy 0, policy_version 26171 (0.0024) [2024-03-29 15:01:28,563][00497] Updated weights for policy 0, policy_version 26181 (0.0024) [2024-03-29 15:01:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 428949504. Throughput: 0: 41711.0. Samples: 311168520. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:28,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 15:01:32,742][00497] Updated weights for policy 0, policy_version 26191 (0.0023) [2024-03-29 15:01:33,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 429146112. Throughput: 0: 41429.3. Samples: 311288320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:33,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 15:01:36,841][00497] Updated weights for policy 0, policy_version 26201 (0.0026) [2024-03-29 15:01:38,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 429359104. Throughput: 0: 41458.3. Samples: 311558880. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:38,841][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:01:40,232][00497] Updated weights for policy 0, policy_version 26211 (0.0029) [2024-03-29 15:01:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 429588480. Throughput: 0: 41902.6. Samples: 311804320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:43,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 15:01:44,161][00497] Updated weights for policy 0, policy_version 26221 (0.0030) [2024-03-29 15:01:48,240][00497] Updated weights for policy 0, policy_version 26231 (0.0021) [2024-03-29 15:01:48,257][00476] Signal inference workers to stop experience collection... (11150 times) [2024-03-29 15:01:48,288][00497] InferenceWorker_p0-w0: stopping experience collection (11150 times) [2024-03-29 15:01:48,480][00476] Signal inference workers to resume experience collection... (11150 times) [2024-03-29 15:01:48,481][00497] InferenceWorker_p0-w0: resuming experience collection (11150 times) [2024-03-29 15:01:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 429801472. Throughput: 0: 41622.4. Samples: 311926440. Policy #0 lag: (min: 1.0, avg: 22.4, max: 43.0) [2024-03-29 15:01:48,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:01:52,281][00497] Updated weights for policy 0, policy_version 26241 (0.0027) [2024-03-29 15:01:53,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 429981696. Throughput: 0: 41818.3. Samples: 312186640. Policy #0 lag: (min: 1.0, avg: 22.4, max: 43.0) [2024-03-29 15:01:53,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 15:01:55,934][00497] Updated weights for policy 0, policy_version 26251 (0.0020) [2024-03-29 15:01:58,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 430194688. Throughput: 0: 42141.0. Samples: 312435040. Policy #0 lag: (min: 1.0, avg: 22.4, max: 43.0) [2024-03-29 15:01:58,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 15:01:59,916][00497] Updated weights for policy 0, policy_version 26261 (0.0017) [2024-03-29 15:02:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 430391296. Throughput: 0: 41643.2. Samples: 312547880. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:03,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:02:04,161][00497] Updated weights for policy 0, policy_version 26271 (0.0019) [2024-03-29 15:02:08,297][00497] Updated weights for policy 0, policy_version 26281 (0.0027) [2024-03-29 15:02:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 430604288. Throughput: 0: 41672.7. Samples: 312802380. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:08,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 15:02:11,903][00497] Updated weights for policy 0, policy_version 26291 (0.0021) [2024-03-29 15:02:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 430817280. Throughput: 0: 41857.4. Samples: 313052100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:13,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 15:02:15,534][00497] Updated weights for policy 0, policy_version 26301 (0.0031) [2024-03-29 15:02:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 431030272. Throughput: 0: 41964.5. Samples: 313176720. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:18,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 15:02:19,641][00497] Updated weights for policy 0, policy_version 26311 (0.0018) [2024-03-29 15:02:23,819][00497] Updated weights for policy 0, policy_version 26321 (0.0026) [2024-03-29 15:02:23,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 431243264. Throughput: 0: 41773.4. Samples: 313438680. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:02:23,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:02:27,316][00476] Signal inference workers to stop experience collection... (11200 times) [2024-03-29 15:02:27,397][00476] Signal inference workers to resume experience collection... (11200 times) [2024-03-29 15:02:27,400][00497] InferenceWorker_p0-w0: stopping experience collection (11200 times) [2024-03-29 15:02:27,403][00497] Updated weights for policy 0, policy_version 26331 (0.0021) [2024-03-29 15:02:27,423][00497] InferenceWorker_p0-w0: resuming experience collection (11200 times) [2024-03-29 15:02:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 431456256. Throughput: 0: 41905.9. Samples: 313690080. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:02:28,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 15:02:31,041][00497] Updated weights for policy 0, policy_version 26341 (0.0027) [2024-03-29 15:02:33,839][00126] Fps is (10 sec: 42597.5, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 431669248. Throughput: 0: 41860.3. Samples: 313810160. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:02:33,842][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 15:02:35,212][00497] Updated weights for policy 0, policy_version 26351 (0.0023) [2024-03-29 15:02:38,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 431865856. Throughput: 0: 41813.2. Samples: 314068240. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:02:38,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 15:02:39,339][00497] Updated weights for policy 0, policy_version 26361 (0.0025) [2024-03-29 15:02:43,028][00497] Updated weights for policy 0, policy_version 26371 (0.0031) [2024-03-29 15:02:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 432078848. Throughput: 0: 41717.8. Samples: 314312340. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 15:02:43,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 15:02:47,088][00497] Updated weights for policy 0, policy_version 26381 (0.0024) [2024-03-29 15:02:48,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 432291840. Throughput: 0: 41929.4. Samples: 314434700. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 15:02:48,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:02:51,157][00497] Updated weights for policy 0, policy_version 26391 (0.0021) [2024-03-29 15:02:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 432488448. Throughput: 0: 41665.4. Samples: 314677320. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 15:02:53,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 15:02:55,251][00497] Updated weights for policy 0, policy_version 26401 (0.0027) [2024-03-29 15:02:58,733][00497] Updated weights for policy 0, policy_version 26411 (0.0020) [2024-03-29 15:02:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 432717824. Throughput: 0: 41969.7. Samples: 314940740. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:02:58,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 15:03:00,761][00476] Signal inference workers to stop experience collection... (11250 times) [2024-03-29 15:03:00,786][00497] InferenceWorker_p0-w0: stopping experience collection (11250 times) [2024-03-29 15:03:00,987][00476] Signal inference workers to resume experience collection... (11250 times) [2024-03-29 15:03:00,988][00497] InferenceWorker_p0-w0: resuming experience collection (11250 times) [2024-03-29 15:03:02,671][00497] Updated weights for policy 0, policy_version 26421 (0.0034) [2024-03-29 15:03:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41820.8). Total num frames: 432930816. Throughput: 0: 42004.0. Samples: 315066900. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:03:03,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:03:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026424_432930816.pth... [2024-03-29 15:03:04,171][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025812_422903808.pth [2024-03-29 15:03:06,763][00497] Updated weights for policy 0, policy_version 26431 (0.0026) [2024-03-29 15:03:08,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 433127424. Throughput: 0: 41688.0. Samples: 315314640. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:03:08,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 15:03:11,036][00497] Updated weights for policy 0, policy_version 26441 (0.0030) [2024-03-29 15:03:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 433324032. Throughput: 0: 41764.0. Samples: 315569460. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:03:13,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 15:03:14,464][00497] Updated weights for policy 0, policy_version 26451 (0.0030) [2024-03-29 15:03:18,390][00497] Updated weights for policy 0, policy_version 26461 (0.0022) [2024-03-29 15:03:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 433537024. Throughput: 0: 41855.7. Samples: 315693660. Policy #0 lag: (min: 0.0, avg: 21.0, max: 40.0) [2024-03-29 15:03:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:03:22,325][00497] Updated weights for policy 0, policy_version 26471 (0.0019) [2024-03-29 15:03:23,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.1, 300 sec: 41876.4). Total num frames: 433766400. Throughput: 0: 41680.0. Samples: 315943840. Policy #0 lag: (min: 0.0, avg: 21.0, max: 40.0) [2024-03-29 15:03:23,840][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 15:03:26,637][00497] Updated weights for policy 0, policy_version 26481 (0.0021) [2024-03-29 15:03:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 433963008. Throughput: 0: 41854.7. Samples: 316195800. Policy #0 lag: (min: 0.0, avg: 21.0, max: 40.0) [2024-03-29 15:03:28,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 15:03:30,119][00497] Updated weights for policy 0, policy_version 26491 (0.0024) [2024-03-29 15:03:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 434176000. Throughput: 0: 42002.6. Samples: 316324820. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 15:03:33,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 15:03:34,047][00497] Updated weights for policy 0, policy_version 26501 (0.0026) [2024-03-29 15:03:35,250][00476] Signal inference workers to stop experience collection... (11300 times) [2024-03-29 15:03:35,309][00497] InferenceWorker_p0-w0: stopping experience collection (11300 times) [2024-03-29 15:03:35,344][00476] Signal inference workers to resume experience collection... (11300 times) [2024-03-29 15:03:35,347][00497] InferenceWorker_p0-w0: resuming experience collection (11300 times) [2024-03-29 15:03:37,914][00497] Updated weights for policy 0, policy_version 26511 (0.0031) [2024-03-29 15:03:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 434388992. Throughput: 0: 41981.3. Samples: 316566480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 15:03:38,841][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 15:03:42,189][00497] Updated weights for policy 0, policy_version 26521 (0.0021) [2024-03-29 15:03:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 434585600. Throughput: 0: 41850.7. Samples: 316824020. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 15:03:43,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 15:03:45,770][00497] Updated weights for policy 0, policy_version 26531 (0.0029) [2024-03-29 15:03:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 434798592. Throughput: 0: 41667.9. Samples: 316941960. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 15:03:48,842][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 15:03:49,900][00497] Updated weights for policy 0, policy_version 26541 (0.0027) [2024-03-29 15:03:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 434995200. Throughput: 0: 41603.9. Samples: 317186820. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 15:03:53,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:03:53,880][00497] Updated weights for policy 0, policy_version 26551 (0.0032) [2024-03-29 15:03:57,895][00497] Updated weights for policy 0, policy_version 26561 (0.0023) [2024-03-29 15:03:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 435208192. Throughput: 0: 41798.6. Samples: 317450400. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 15:03:58,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:04:01,577][00497] Updated weights for policy 0, policy_version 26571 (0.0028) [2024-03-29 15:04:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 435421184. Throughput: 0: 41748.4. Samples: 317572340. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 15:04:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:04:05,410][00497] Updated weights for policy 0, policy_version 26581 (0.0018) [2024-03-29 15:04:08,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 435650560. Throughput: 0: 41652.1. Samples: 317818180. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:04:08,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 15:04:09,463][00497] Updated weights for policy 0, policy_version 26591 (0.0022) [2024-03-29 15:04:10,591][00476] Signal inference workers to stop experience collection... (11350 times) [2024-03-29 15:04:10,592][00476] Signal inference workers to resume experience collection... (11350 times) [2024-03-29 15:04:10,636][00497] InferenceWorker_p0-w0: stopping experience collection (11350 times) [2024-03-29 15:04:10,636][00497] InferenceWorker_p0-w0: resuming experience collection (11350 times) [2024-03-29 15:04:13,670][00497] Updated weights for policy 0, policy_version 26601 (0.0024) [2024-03-29 15:04:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 435830784. Throughput: 0: 42146.7. Samples: 318092400. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:04:13,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:04:17,190][00497] Updated weights for policy 0, policy_version 26611 (0.0024) [2024-03-29 15:04:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 436043776. Throughput: 0: 41740.5. Samples: 318203140. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:04:18,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 15:04:21,145][00497] Updated weights for policy 0, policy_version 26621 (0.0022) [2024-03-29 15:04:23,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 436256768. Throughput: 0: 41875.6. Samples: 318450880. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:04:23,841][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 15:04:25,259][00497] Updated weights for policy 0, policy_version 26631 (0.0022) [2024-03-29 15:04:28,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 436436992. Throughput: 0: 41903.6. Samples: 318709680. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 15:04:28,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:04:29,511][00497] Updated weights for policy 0, policy_version 26641 (0.0026) [2024-03-29 15:04:32,890][00497] Updated weights for policy 0, policy_version 26651 (0.0024) [2024-03-29 15:04:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 436682752. Throughput: 0: 41976.1. Samples: 318830880. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 15:04:33,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 15:04:36,761][00497] Updated weights for policy 0, policy_version 26661 (0.0018) [2024-03-29 15:04:38,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 436912128. Throughput: 0: 42160.9. Samples: 319084060. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 15:04:38,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 15:04:40,862][00497] Updated weights for policy 0, policy_version 26671 (0.0026) [2024-03-29 15:04:43,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 437092352. Throughput: 0: 41830.6. Samples: 319332780. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 15:04:43,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:04:45,107][00497] Updated weights for policy 0, policy_version 26681 (0.0021) [2024-03-29 15:04:45,207][00476] Signal inference workers to stop experience collection... (11400 times) [2024-03-29 15:04:45,235][00497] InferenceWorker_p0-w0: stopping experience collection (11400 times) [2024-03-29 15:04:45,390][00476] Signal inference workers to resume experience collection... (11400 times) [2024-03-29 15:04:45,390][00497] InferenceWorker_p0-w0: resuming experience collection (11400 times) [2024-03-29 15:04:48,644][00497] Updated weights for policy 0, policy_version 26691 (0.0030) [2024-03-29 15:04:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 437305344. Throughput: 0: 41885.3. Samples: 319457180. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 15:04:48,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 15:04:52,535][00497] Updated weights for policy 0, policy_version 26701 (0.0027) [2024-03-29 15:04:53,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 437534720. Throughput: 0: 41996.4. Samples: 319708020. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 15:04:53,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:04:56,352][00497] Updated weights for policy 0, policy_version 26711 (0.0022) [2024-03-29 15:04:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 437714944. Throughput: 0: 41468.0. Samples: 319958460. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 15:04:58,841][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 15:05:00,559][00497] Updated weights for policy 0, policy_version 26721 (0.0017) [2024-03-29 15:05:03,839][00126] Fps is (10 sec: 40959.2, 60 sec: 42052.1, 300 sec: 41820.9). Total num frames: 437944320. Throughput: 0: 41906.5. Samples: 320088940. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 15:05:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 15:05:03,864][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026730_437944320.pth... [2024-03-29 15:05:04,409][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026117_427900928.pth [2024-03-29 15:05:04,670][00497] Updated weights for policy 0, policy_version 26731 (0.0020) [2024-03-29 15:05:07,938][00497] Updated weights for policy 0, policy_version 26741 (0.0026) [2024-03-29 15:05:08,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 438157312. Throughput: 0: 41975.1. Samples: 320339760. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 15:05:08,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 15:05:11,732][00497] Updated weights for policy 0, policy_version 26751 (0.0020) [2024-03-29 15:05:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 438353920. Throughput: 0: 41851.0. Samples: 320592980. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 15:05:13,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 15:05:16,085][00497] Updated weights for policy 0, policy_version 26761 (0.0025) [2024-03-29 15:05:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 438583296. Throughput: 0: 42208.9. Samples: 320730280. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 15:05:18,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 15:05:19,425][00476] Signal inference workers to stop experience collection... (11450 times) [2024-03-29 15:05:19,447][00497] InferenceWorker_p0-w0: stopping experience collection (11450 times) [2024-03-29 15:05:19,608][00476] Signal inference workers to resume experience collection... (11450 times) [2024-03-29 15:05:19,608][00497] InferenceWorker_p0-w0: resuming experience collection (11450 times) [2024-03-29 15:05:19,611][00497] Updated weights for policy 0, policy_version 26771 (0.0018) [2024-03-29 15:05:23,210][00497] Updated weights for policy 0, policy_version 26781 (0.0023) [2024-03-29 15:05:23,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 438796288. Throughput: 0: 42178.2. Samples: 320982080. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 15:05:23,840][00126] Avg episode reward: [(0, '0.256')] [2024-03-29 15:05:27,435][00497] Updated weights for policy 0, policy_version 26791 (0.0019) [2024-03-29 15:05:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42871.4, 300 sec: 41931.9). Total num frames: 439009280. Throughput: 0: 41844.9. Samples: 321215800. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 15:05:28,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:05:31,796][00497] Updated weights for policy 0, policy_version 26801 (0.0018) [2024-03-29 15:05:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 439205888. Throughput: 0: 42345.3. Samples: 321362720. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 15:05:33,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:05:35,196][00497] Updated weights for policy 0, policy_version 26811 (0.0018) [2024-03-29 15:05:38,730][00497] Updated weights for policy 0, policy_version 26821 (0.0021) [2024-03-29 15:05:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 439435264. Throughput: 0: 42497.7. Samples: 321620420. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:05:38,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 15:05:42,761][00497] Updated weights for policy 0, policy_version 26831 (0.0018) [2024-03-29 15:05:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 439631872. Throughput: 0: 42134.2. Samples: 321854500. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:05:43,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 15:05:47,362][00497] Updated weights for policy 0, policy_version 26841 (0.0019) [2024-03-29 15:05:48,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 439828480. Throughput: 0: 42353.1. Samples: 321994820. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:05:48,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 15:05:50,942][00497] Updated weights for policy 0, policy_version 26851 (0.0019) [2024-03-29 15:05:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 440041472. Throughput: 0: 42203.0. Samples: 322238900. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:05:53,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 15:05:54,655][00497] Updated weights for policy 0, policy_version 26861 (0.0028) [2024-03-29 15:05:54,966][00476] Signal inference workers to stop experience collection... (11500 times) [2024-03-29 15:05:54,967][00476] Signal inference workers to resume experience collection... (11500 times) [2024-03-29 15:05:55,005][00497] InferenceWorker_p0-w0: stopping experience collection (11500 times) [2024-03-29 15:05:55,005][00497] InferenceWorker_p0-w0: resuming experience collection (11500 times) [2024-03-29 15:05:58,416][00497] Updated weights for policy 0, policy_version 26871 (0.0028) [2024-03-29 15:05:58,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42598.3, 300 sec: 41931.9). Total num frames: 440270848. Throughput: 0: 42153.8. Samples: 322489900. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 15:05:58,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 15:06:02,897][00497] Updated weights for policy 0, policy_version 26881 (0.0018) [2024-03-29 15:06:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 440467456. Throughput: 0: 42014.6. Samples: 322620940. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 15:06:03,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 15:06:06,675][00497] Updated weights for policy 0, policy_version 26891 (0.0028) [2024-03-29 15:06:08,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 440664064. Throughput: 0: 41878.3. Samples: 322866600. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 15:06:08,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:06:10,181][00497] Updated weights for policy 0, policy_version 26901 (0.0023) [2024-03-29 15:06:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 440893440. Throughput: 0: 41947.1. Samples: 323103420. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 15:06:13,841][00126] Avg episode reward: [(0, '0.496')] [2024-03-29 15:06:14,244][00497] Updated weights for policy 0, policy_version 26911 (0.0023) [2024-03-29 15:06:18,575][00497] Updated weights for policy 0, policy_version 26921 (0.0022) [2024-03-29 15:06:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 441073664. Throughput: 0: 41679.6. Samples: 323238300. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:06:18,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:06:22,319][00497] Updated weights for policy 0, policy_version 26931 (0.0020) [2024-03-29 15:06:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 441286656. Throughput: 0: 41499.1. Samples: 323487880. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:06:23,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 15:06:25,785][00497] Updated weights for policy 0, policy_version 26941 (0.0025) [2024-03-29 15:06:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 441516032. Throughput: 0: 41657.8. Samples: 323729100. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:06:28,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 15:06:30,235][00497] Updated weights for policy 0, policy_version 26951 (0.0026) [2024-03-29 15:06:30,258][00476] Signal inference workers to stop experience collection... (11550 times) [2024-03-29 15:06:30,292][00497] InferenceWorker_p0-w0: stopping experience collection (11550 times) [2024-03-29 15:06:30,482][00476] Signal inference workers to resume experience collection... (11550 times) [2024-03-29 15:06:30,483][00497] InferenceWorker_p0-w0: resuming experience collection (11550 times) [2024-03-29 15:06:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 441696256. Throughput: 0: 41347.5. Samples: 323855460. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 15:06:33,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 15:06:34,408][00497] Updated weights for policy 0, policy_version 26961 (0.0020) [2024-03-29 15:06:37,977][00497] Updated weights for policy 0, policy_version 26971 (0.0024) [2024-03-29 15:06:38,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 441925632. Throughput: 0: 41724.4. Samples: 324116500. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 15:06:38,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 15:06:41,567][00497] Updated weights for policy 0, policy_version 26981 (0.0028) [2024-03-29 15:06:43,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 442138624. Throughput: 0: 41753.5. Samples: 324368800. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 15:06:43,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 15:06:45,820][00497] Updated weights for policy 0, policy_version 26991 (0.0024) [2024-03-29 15:06:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 442335232. Throughput: 0: 41520.5. Samples: 324489360. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 15:06:48,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 15:06:50,067][00497] Updated weights for policy 0, policy_version 27001 (0.0026) [2024-03-29 15:06:53,586][00497] Updated weights for policy 0, policy_version 27011 (0.0023) [2024-03-29 15:06:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 442548224. Throughput: 0: 41538.6. Samples: 324735840. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 15:06:53,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 15:06:57,141][00497] Updated weights for policy 0, policy_version 27021 (0.0023) [2024-03-29 15:06:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 442761216. Throughput: 0: 42012.0. Samples: 324993960. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 15:06:58,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:07:01,356][00497] Updated weights for policy 0, policy_version 27031 (0.0029) [2024-03-29 15:07:03,070][00476] Signal inference workers to stop experience collection... (11600 times) [2024-03-29 15:07:03,091][00497] InferenceWorker_p0-w0: stopping experience collection (11600 times) [2024-03-29 15:07:03,275][00476] Signal inference workers to resume experience collection... (11600 times) [2024-03-29 15:07:03,276][00497] InferenceWorker_p0-w0: resuming experience collection (11600 times) [2024-03-29 15:07:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 442974208. Throughput: 0: 41830.2. Samples: 325120660. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 15:07:03,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 15:07:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027037_442974208.pth... [2024-03-29 15:07:04,193][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026424_432930816.pth [2024-03-29 15:07:05,819][00497] Updated weights for policy 0, policy_version 27041 (0.0023) [2024-03-29 15:07:08,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 443170816. Throughput: 0: 41705.7. Samples: 325364640. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:07:08,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 15:07:09,276][00497] Updated weights for policy 0, policy_version 27051 (0.0021) [2024-03-29 15:07:12,911][00497] Updated weights for policy 0, policy_version 27061 (0.0026) [2024-03-29 15:07:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 443400192. Throughput: 0: 41956.8. Samples: 325617160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:07:13,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:07:16,997][00497] Updated weights for policy 0, policy_version 27071 (0.0019) [2024-03-29 15:07:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 443596800. Throughput: 0: 42000.4. Samples: 325745480. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:07:18,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 15:07:21,306][00497] Updated weights for policy 0, policy_version 27081 (0.0028) [2024-03-29 15:07:23,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 443809792. Throughput: 0: 42091.5. Samples: 326010620. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:07:23,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 15:07:25,077][00497] Updated weights for policy 0, policy_version 27091 (0.0032) [2024-03-29 15:07:28,652][00497] Updated weights for policy 0, policy_version 27101 (0.0024) [2024-03-29 15:07:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 444022784. Throughput: 0: 41635.0. Samples: 326242380. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:07:28,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 15:07:32,925][00497] Updated weights for policy 0, policy_version 27111 (0.0024) [2024-03-29 15:07:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 444219392. Throughput: 0: 41699.9. Samples: 326365860. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:07:33,841][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 15:07:36,280][00476] Signal inference workers to stop experience collection... (11650 times) [2024-03-29 15:07:36,305][00497] InferenceWorker_p0-w0: stopping experience collection (11650 times) [2024-03-29 15:07:36,458][00476] Signal inference workers to resume experience collection... (11650 times) [2024-03-29 15:07:36,458][00497] InferenceWorker_p0-w0: resuming experience collection (11650 times) [2024-03-29 15:07:37,172][00497] Updated weights for policy 0, policy_version 27121 (0.0022) [2024-03-29 15:07:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 444432384. Throughput: 0: 42240.9. Samples: 326636680. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:07:38,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 15:07:40,794][00497] Updated weights for policy 0, policy_version 27131 (0.0022) [2024-03-29 15:07:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 444661760. Throughput: 0: 41662.0. Samples: 326868760. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:07:43,842][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 15:07:44,375][00497] Updated weights for policy 0, policy_version 27141 (0.0025) [2024-03-29 15:07:48,385][00497] Updated weights for policy 0, policy_version 27151 (0.0019) [2024-03-29 15:07:48,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 444858368. Throughput: 0: 41565.4. Samples: 326991100. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 15:07:48,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:07:52,957][00497] Updated weights for policy 0, policy_version 27161 (0.0023) [2024-03-29 15:07:53,839][00126] Fps is (10 sec: 37684.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 445038592. Throughput: 0: 42055.7. Samples: 327257140. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 15:07:53,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 15:07:56,479][00497] Updated weights for policy 0, policy_version 27171 (0.0029) [2024-03-29 15:07:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 445267968. Throughput: 0: 41813.4. Samples: 327498760. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 15:07:58,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 15:08:00,072][00497] Updated weights for policy 0, policy_version 27181 (0.0020) [2024-03-29 15:08:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 445480960. Throughput: 0: 41667.6. Samples: 327620520. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:08:03,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:08:03,848][00497] Updated weights for policy 0, policy_version 27191 (0.0025) [2024-03-29 15:08:08,473][00497] Updated weights for policy 0, policy_version 27201 (0.0021) [2024-03-29 15:08:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 445677568. Throughput: 0: 41709.0. Samples: 327887520. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:08:08,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:08:09,904][00476] Signal inference workers to stop experience collection... (11700 times) [2024-03-29 15:08:09,941][00497] InferenceWorker_p0-w0: stopping experience collection (11700 times) [2024-03-29 15:08:10,132][00476] Signal inference workers to resume experience collection... (11700 times) [2024-03-29 15:08:10,133][00497] InferenceWorker_p0-w0: resuming experience collection (11700 times) [2024-03-29 15:08:12,086][00497] Updated weights for policy 0, policy_version 27211 (0.0021) [2024-03-29 15:08:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 445890560. Throughput: 0: 41864.4. Samples: 328126280. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:08:13,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 15:08:15,732][00497] Updated weights for policy 0, policy_version 27221 (0.0030) [2024-03-29 15:08:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 446119936. Throughput: 0: 41866.4. Samples: 328249840. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 15:08:18,841][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 15:08:19,566][00497] Updated weights for policy 0, policy_version 27231 (0.0024) [2024-03-29 15:08:23,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41233.2, 300 sec: 41765.3). Total num frames: 446283776. Throughput: 0: 41537.4. Samples: 328505860. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:08:23,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 15:08:24,130][00497] Updated weights for policy 0, policy_version 27241 (0.0025) [2024-03-29 15:08:27,722][00497] Updated weights for policy 0, policy_version 27251 (0.0019) [2024-03-29 15:08:28,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 446513152. Throughput: 0: 41796.5. Samples: 328749600. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:08:28,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 15:08:31,531][00497] Updated weights for policy 0, policy_version 27261 (0.0018) [2024-03-29 15:08:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 446726144. Throughput: 0: 41795.9. Samples: 328871920. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:08:33,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 15:08:35,392][00497] Updated weights for policy 0, policy_version 27271 (0.0026) [2024-03-29 15:08:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 446922752. Throughput: 0: 41379.8. Samples: 329119240. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:08:38,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 15:08:40,086][00497] Updated weights for policy 0, policy_version 27281 (0.0019) [2024-03-29 15:08:43,549][00497] Updated weights for policy 0, policy_version 27291 (0.0029) [2024-03-29 15:08:43,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 447152128. Throughput: 0: 41704.3. Samples: 329375460. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:08:43,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 15:08:44,735][00476] Signal inference workers to stop experience collection... (11750 times) [2024-03-29 15:08:44,735][00476] Signal inference workers to resume experience collection... (11750 times) [2024-03-29 15:08:44,770][00497] InferenceWorker_p0-w0: stopping experience collection (11750 times) [2024-03-29 15:08:44,770][00497] InferenceWorker_p0-w0: resuming experience collection (11750 times) [2024-03-29 15:08:47,397][00497] Updated weights for policy 0, policy_version 27301 (0.0024) [2024-03-29 15:08:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 447365120. Throughput: 0: 41701.3. Samples: 329497080. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:08:48,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:08:51,224][00497] Updated weights for policy 0, policy_version 27311 (0.0023) [2024-03-29 15:08:53,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 447545344. Throughput: 0: 41171.5. Samples: 329740240. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:08:53,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 15:08:55,843][00497] Updated weights for policy 0, policy_version 27321 (0.0027) [2024-03-29 15:08:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 447758336. Throughput: 0: 41366.0. Samples: 329987740. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 15:08:58,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 15:08:59,195][00497] Updated weights for policy 0, policy_version 27331 (0.0023) [2024-03-29 15:09:03,255][00497] Updated weights for policy 0, policy_version 27341 (0.0034) [2024-03-29 15:09:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 447987712. Throughput: 0: 41659.0. Samples: 330124500. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 15:09:03,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 15:09:04,106][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027344_448004096.pth... [2024-03-29 15:09:04,444][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026730_437944320.pth [2024-03-29 15:09:06,996][00497] Updated weights for policy 0, policy_version 27351 (0.0027) [2024-03-29 15:09:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 448184320. Throughput: 0: 41383.9. Samples: 330368140. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 15:09:08,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 15:09:11,454][00497] Updated weights for policy 0, policy_version 27361 (0.0025) [2024-03-29 15:09:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 448397312. Throughput: 0: 41636.6. Samples: 330623240. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 15:09:13,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 15:09:15,033][00497] Updated weights for policy 0, policy_version 27371 (0.0029) [2024-03-29 15:09:15,810][00476] Signal inference workers to stop experience collection... (11800 times) [2024-03-29 15:09:15,840][00497] InferenceWorker_p0-w0: stopping experience collection (11800 times) [2024-03-29 15:09:16,000][00476] Signal inference workers to resume experience collection... (11800 times) [2024-03-29 15:09:16,000][00497] InferenceWorker_p0-w0: resuming experience collection (11800 times) [2024-03-29 15:09:18,818][00497] Updated weights for policy 0, policy_version 27381 (0.0027) [2024-03-29 15:09:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 448610304. Throughput: 0: 41753.8. Samples: 330750840. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 15:09:18,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 15:09:22,837][00497] Updated weights for policy 0, policy_version 27391 (0.0019) [2024-03-29 15:09:23,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 448806912. Throughput: 0: 41569.0. Samples: 330989840. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 15:09:23,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 15:09:26,978][00497] Updated weights for policy 0, policy_version 27401 (0.0029) [2024-03-29 15:09:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 449019904. Throughput: 0: 41698.4. Samples: 331251880. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 15:09:28,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 15:09:30,736][00497] Updated weights for policy 0, policy_version 27411 (0.0025) [2024-03-29 15:09:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 449232896. Throughput: 0: 41718.7. Samples: 331374420. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 15:09:33,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 15:09:34,347][00497] Updated weights for policy 0, policy_version 27421 (0.0024) [2024-03-29 15:09:38,620][00497] Updated weights for policy 0, policy_version 27431 (0.0024) [2024-03-29 15:09:38,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 449429504. Throughput: 0: 41892.0. Samples: 331625380. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 15:09:38,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 15:09:42,572][00497] Updated weights for policy 0, policy_version 27441 (0.0021) [2024-03-29 15:09:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.3, 300 sec: 41820.9). Total num frames: 449642496. Throughput: 0: 41987.9. Samples: 331877200. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 15:09:43,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 15:09:46,037][00476] Signal inference workers to stop experience collection... (11850 times) [2024-03-29 15:09:46,038][00476] Signal inference workers to resume experience collection... (11850 times) [2024-03-29 15:09:46,077][00497] InferenceWorker_p0-w0: stopping experience collection (11850 times) [2024-03-29 15:09:46,081][00497] InferenceWorker_p0-w0: resuming experience collection (11850 times) [2024-03-29 15:09:46,370][00497] Updated weights for policy 0, policy_version 27451 (0.0031) [2024-03-29 15:09:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 449871872. Throughput: 0: 41933.4. Samples: 332011500. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 15:09:48,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:09:49,901][00497] Updated weights for policy 0, policy_version 27461 (0.0023) [2024-03-29 15:09:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 450068480. Throughput: 0: 42143.6. Samples: 332264600. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 15:09:53,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 15:09:53,956][00497] Updated weights for policy 0, policy_version 27471 (0.0019) [2024-03-29 15:09:58,140][00497] Updated weights for policy 0, policy_version 27481 (0.0029) [2024-03-29 15:09:58,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 450281472. Throughput: 0: 42005.7. Samples: 332513500. Policy #0 lag: (min: 0.0, avg: 22.6, max: 44.0) [2024-03-29 15:09:58,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 15:10:02,165][00497] Updated weights for policy 0, policy_version 27491 (0.0022) [2024-03-29 15:10:03,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 450478080. Throughput: 0: 41726.1. Samples: 332628520. Policy #0 lag: (min: 0.0, avg: 22.6, max: 44.0) [2024-03-29 15:10:03,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 15:10:05,739][00497] Updated weights for policy 0, policy_version 27501 (0.0021) [2024-03-29 15:10:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 450707456. Throughput: 0: 41976.1. Samples: 332878760. Policy #0 lag: (min: 0.0, avg: 22.6, max: 44.0) [2024-03-29 15:10:08,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 15:10:09,510][00497] Updated weights for policy 0, policy_version 27511 (0.0024) [2024-03-29 15:10:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 450887680. Throughput: 0: 42183.0. Samples: 333150120. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:10:13,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:10:13,909][00497] Updated weights for policy 0, policy_version 27521 (0.0028) [2024-03-29 15:10:17,213][00476] Signal inference workers to stop experience collection... (11900 times) [2024-03-29 15:10:17,266][00497] InferenceWorker_p0-w0: stopping experience collection (11900 times) [2024-03-29 15:10:17,301][00476] Signal inference workers to resume experience collection... (11900 times) [2024-03-29 15:10:17,303][00497] InferenceWorker_p0-w0: resuming experience collection (11900 times) [2024-03-29 15:10:17,907][00497] Updated weights for policy 0, policy_version 27531 (0.0028) [2024-03-29 15:10:18,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 451100672. Throughput: 0: 41780.8. Samples: 333254560. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:10:18,840][00126] Avg episode reward: [(0, '0.276')] [2024-03-29 15:10:21,553][00497] Updated weights for policy 0, policy_version 27541 (0.0036) [2024-03-29 15:10:23,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 451330048. Throughput: 0: 41692.0. Samples: 333501520. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:10:23,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 15:10:25,407][00497] Updated weights for policy 0, policy_version 27551 (0.0018) [2024-03-29 15:10:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 451510272. Throughput: 0: 41950.7. Samples: 333764980. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:10:28,841][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 15:10:29,682][00497] Updated weights for policy 0, policy_version 27561 (0.0022) [2024-03-29 15:10:33,762][00497] Updated weights for policy 0, policy_version 27571 (0.0022) [2024-03-29 15:10:33,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 451723264. Throughput: 0: 41445.3. Samples: 333876540. Policy #0 lag: (min: 1.0, avg: 21.5, max: 45.0) [2024-03-29 15:10:33,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:10:37,137][00497] Updated weights for policy 0, policy_version 27581 (0.0025) [2024-03-29 15:10:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 451952640. Throughput: 0: 41421.8. Samples: 334128580. Policy #0 lag: (min: 1.0, avg: 21.5, max: 45.0) [2024-03-29 15:10:38,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 15:10:41,416][00497] Updated weights for policy 0, policy_version 27591 (0.0025) [2024-03-29 15:10:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 452132864. Throughput: 0: 41458.2. Samples: 334379120. Policy #0 lag: (min: 1.0, avg: 21.5, max: 45.0) [2024-03-29 15:10:43,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:10:45,398][00497] Updated weights for policy 0, policy_version 27601 (0.0034) [2024-03-29 15:10:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 452345856. Throughput: 0: 41723.7. Samples: 334506080. Policy #0 lag: (min: 1.0, avg: 21.5, max: 45.0) [2024-03-29 15:10:48,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 15:10:49,065][00476] Signal inference workers to stop experience collection... (11950 times) [2024-03-29 15:10:49,068][00476] Signal inference workers to resume experience collection... (11950 times) [2024-03-29 15:10:49,112][00497] InferenceWorker_p0-w0: stopping experience collection (11950 times) [2024-03-29 15:10:49,112][00497] InferenceWorker_p0-w0: resuming experience collection (11950 times) [2024-03-29 15:10:49,341][00497] Updated weights for policy 0, policy_version 27611 (0.0023) [2024-03-29 15:10:52,918][00497] Updated weights for policy 0, policy_version 27621 (0.0024) [2024-03-29 15:10:53,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 452591616. Throughput: 0: 41652.9. Samples: 334753140. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:10:53,840][00126] Avg episode reward: [(0, '0.476')] [2024-03-29 15:10:56,867][00497] Updated weights for policy 0, policy_version 27631 (0.0024) [2024-03-29 15:10:58,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 452771840. Throughput: 0: 41423.5. Samples: 335014180. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:10:58,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 15:11:00,917][00497] Updated weights for policy 0, policy_version 27641 (0.0021) [2024-03-29 15:11:03,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 452984832. Throughput: 0: 41869.7. Samples: 335138700. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:11:03,842][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 15:11:04,042][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027649_453001216.pth... [2024-03-29 15:11:04,378][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027037_442974208.pth [2024-03-29 15:11:04,948][00497] Updated weights for policy 0, policy_version 27651 (0.0018) [2024-03-29 15:11:08,219][00497] Updated weights for policy 0, policy_version 27661 (0.0022) [2024-03-29 15:11:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 453214208. Throughput: 0: 41928.5. Samples: 335388300. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:11:08,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 15:11:12,454][00497] Updated weights for policy 0, policy_version 27671 (0.0026) [2024-03-29 15:11:13,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 453410816. Throughput: 0: 41865.8. Samples: 335648940. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:11:13,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 15:11:16,447][00497] Updated weights for policy 0, policy_version 27681 (0.0029) [2024-03-29 15:11:18,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 453640192. Throughput: 0: 42299.6. Samples: 335780020. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:11:18,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 15:11:20,390][00497] Updated weights for policy 0, policy_version 27691 (0.0024) [2024-03-29 15:11:23,830][00476] Signal inference workers to stop experience collection... (12000 times) [2024-03-29 15:11:23,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 453836800. Throughput: 0: 42075.0. Samples: 336021960. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:11:23,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:11:23,906][00476] Signal inference workers to resume experience collection... (12000 times) [2024-03-29 15:11:23,908][00497] InferenceWorker_p0-w0: stopping experience collection (12000 times) [2024-03-29 15:11:23,915][00497] Updated weights for policy 0, policy_version 27701 (0.0031) [2024-03-29 15:11:23,936][00497] InferenceWorker_p0-w0: resuming experience collection (12000 times) [2024-03-29 15:11:27,929][00497] Updated weights for policy 0, policy_version 27711 (0.0017) [2024-03-29 15:11:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 454049792. Throughput: 0: 42485.4. Samples: 336290960. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:11:28,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 15:11:31,864][00497] Updated weights for policy 0, policy_version 27721 (0.0019) [2024-03-29 15:11:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 454246400. Throughput: 0: 42710.2. Samples: 336428040. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:11:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 15:11:35,906][00497] Updated weights for policy 0, policy_version 27731 (0.0023) [2024-03-29 15:11:38,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 454492160. Throughput: 0: 42236.0. Samples: 336653760. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:11:38,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 15:11:39,365][00497] Updated weights for policy 0, policy_version 27741 (0.0027) [2024-03-29 15:11:43,618][00497] Updated weights for policy 0, policy_version 27751 (0.0024) [2024-03-29 15:11:43,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 454672384. Throughput: 0: 42146.6. Samples: 336910780. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:11:43,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 15:11:47,430][00497] Updated weights for policy 0, policy_version 27761 (0.0024) [2024-03-29 15:11:48,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 454885376. Throughput: 0: 42540.1. Samples: 337053000. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 15:11:48,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:11:51,300][00497] Updated weights for policy 0, policy_version 27771 (0.0020) [2024-03-29 15:11:53,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 455114752. Throughput: 0: 42281.0. Samples: 337290940. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 15:11:53,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 15:11:54,747][00497] Updated weights for policy 0, policy_version 27781 (0.0022) [2024-03-29 15:11:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 455311360. Throughput: 0: 42050.7. Samples: 337541220. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 15:11:58,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 15:11:59,036][00497] Updated weights for policy 0, policy_version 27791 (0.0031) [2024-03-29 15:11:59,639][00476] Signal inference workers to stop experience collection... (12050 times) [2024-03-29 15:11:59,639][00476] Signal inference workers to resume experience collection... (12050 times) [2024-03-29 15:11:59,673][00497] InferenceWorker_p0-w0: stopping experience collection (12050 times) [2024-03-29 15:11:59,674][00497] InferenceWorker_p0-w0: resuming experience collection (12050 times) [2024-03-29 15:12:02,977][00497] Updated weights for policy 0, policy_version 27801 (0.0027) [2024-03-29 15:12:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 455524352. Throughput: 0: 42273.3. Samples: 337682320. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:12:03,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 15:12:06,647][00497] Updated weights for policy 0, policy_version 27811 (0.0019) [2024-03-29 15:12:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 455753728. Throughput: 0: 42390.6. Samples: 337929540. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:12:08,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:12:10,287][00497] Updated weights for policy 0, policy_version 27821 (0.0024) [2024-03-29 15:12:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 455950336. Throughput: 0: 41739.1. Samples: 338169220. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:12:13,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 15:12:14,640][00497] Updated weights for policy 0, policy_version 27831 (0.0028) [2024-03-29 15:12:18,579][00497] Updated weights for policy 0, policy_version 27841 (0.0026) [2024-03-29 15:12:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 456146944. Throughput: 0: 41772.8. Samples: 338307820. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 15:12:18,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 15:12:22,232][00497] Updated weights for policy 0, policy_version 27851 (0.0019) [2024-03-29 15:12:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 456376320. Throughput: 0: 42351.1. Samples: 338559560. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:12:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 15:12:26,059][00497] Updated weights for policy 0, policy_version 27861 (0.0024) [2024-03-29 15:12:28,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 456589312. Throughput: 0: 42229.4. Samples: 338811100. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:12:28,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 15:12:30,067][00497] Updated weights for policy 0, policy_version 27871 (0.0021) [2024-03-29 15:12:30,856][00476] Signal inference workers to stop experience collection... (12100 times) [2024-03-29 15:12:30,908][00497] InferenceWorker_p0-w0: stopping experience collection (12100 times) [2024-03-29 15:12:30,942][00476] Signal inference workers to resume experience collection... (12100 times) [2024-03-29 15:12:30,945][00497] InferenceWorker_p0-w0: resuming experience collection (12100 times) [2024-03-29 15:12:33,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 456785920. Throughput: 0: 42122.1. Samples: 338948500. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:12:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 15:12:33,994][00497] Updated weights for policy 0, policy_version 27881 (0.0024) [2024-03-29 15:12:37,616][00497] Updated weights for policy 0, policy_version 27891 (0.0033) [2024-03-29 15:12:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 457015296. Throughput: 0: 42505.8. Samples: 339203700. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:12:38,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:12:41,399][00497] Updated weights for policy 0, policy_version 27901 (0.0021) [2024-03-29 15:12:43,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42598.5, 300 sec: 41931.9). Total num frames: 457228288. Throughput: 0: 42411.1. Samples: 339449720. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 15:12:43,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 15:12:45,296][00497] Updated weights for policy 0, policy_version 27911 (0.0019) [2024-03-29 15:12:48,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 457408512. Throughput: 0: 42258.6. Samples: 339583960. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 15:12:48,841][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 15:12:49,524][00497] Updated weights for policy 0, policy_version 27921 (0.0022) [2024-03-29 15:12:53,245][00497] Updated weights for policy 0, policy_version 27931 (0.0023) [2024-03-29 15:12:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 457637888. Throughput: 0: 42238.3. Samples: 339830260. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 15:12:53,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 15:12:56,836][00497] Updated weights for policy 0, policy_version 27941 (0.0030) [2024-03-29 15:12:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 457850880. Throughput: 0: 42289.7. Samples: 340072260. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 15:12:58,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 15:13:00,971][00497] Updated weights for policy 0, policy_version 27951 (0.0026) [2024-03-29 15:13:03,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 458031104. Throughput: 0: 42189.8. Samples: 340206360. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 15:13:03,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 15:13:03,979][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027957_458047488.pth... [2024-03-29 15:13:04,307][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027344_448004096.pth [2024-03-29 15:13:05,358][00476] Signal inference workers to stop experience collection... (12150 times) [2024-03-29 15:13:05,432][00497] InferenceWorker_p0-w0: stopping experience collection (12150 times) [2024-03-29 15:13:05,443][00476] Signal inference workers to resume experience collection... (12150 times) [2024-03-29 15:13:05,462][00497] InferenceWorker_p0-w0: resuming experience collection (12150 times) [2024-03-29 15:13:05,464][00497] Updated weights for policy 0, policy_version 27961 (0.0018) [2024-03-29 15:13:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 458260480. Throughput: 0: 42132.4. Samples: 340455520. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 15:13:08,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 15:13:08,893][00497] Updated weights for policy 0, policy_version 27971 (0.0028) [2024-03-29 15:13:12,584][00497] Updated weights for policy 0, policy_version 27981 (0.0020) [2024-03-29 15:13:13,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 458489856. Throughput: 0: 42119.2. Samples: 340706460. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 15:13:13,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 15:13:16,757][00497] Updated weights for policy 0, policy_version 27991 (0.0018) [2024-03-29 15:13:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 458686464. Throughput: 0: 41837.8. Samples: 340831200. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:13:18,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 15:13:20,812][00497] Updated weights for policy 0, policy_version 28001 (0.0023) [2024-03-29 15:13:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 458915840. Throughput: 0: 42195.9. Samples: 341102520. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:13:23,841][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 15:13:24,396][00497] Updated weights for policy 0, policy_version 28011 (0.0021) [2024-03-29 15:13:28,096][00497] Updated weights for policy 0, policy_version 28021 (0.0031) [2024-03-29 15:13:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 459128832. Throughput: 0: 42049.7. Samples: 341341960. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:13:28,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 15:13:32,347][00497] Updated weights for policy 0, policy_version 28031 (0.0027) [2024-03-29 15:13:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 459325440. Throughput: 0: 41768.4. Samples: 341463540. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:13:33,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:13:36,396][00497] Updated weights for policy 0, policy_version 28041 (0.0023) [2024-03-29 15:13:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 459538432. Throughput: 0: 42250.6. Samples: 341731540. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 15:13:38,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 15:13:39,817][00497] Updated weights for policy 0, policy_version 28051 (0.0021) [2024-03-29 15:13:40,774][00476] Signal inference workers to stop experience collection... (12200 times) [2024-03-29 15:13:40,803][00497] InferenceWorker_p0-w0: stopping experience collection (12200 times) [2024-03-29 15:13:40,982][00476] Signal inference workers to resume experience collection... (12200 times) [2024-03-29 15:13:40,983][00497] InferenceWorker_p0-w0: resuming experience collection (12200 times) [2024-03-29 15:13:43,791][00497] Updated weights for policy 0, policy_version 28061 (0.0029) [2024-03-29 15:13:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 459751424. Throughput: 0: 41849.3. Samples: 341955480. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 15:13:43,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:13:47,917][00497] Updated weights for policy 0, policy_version 28071 (0.0022) [2024-03-29 15:13:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 459931648. Throughput: 0: 41955.1. Samples: 342094340. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 15:13:48,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 15:13:52,366][00497] Updated weights for policy 0, policy_version 28081 (0.0018) [2024-03-29 15:13:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 460161024. Throughput: 0: 42322.7. Samples: 342360040. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 15:13:53,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:13:55,658][00497] Updated weights for policy 0, policy_version 28091 (0.0023) [2024-03-29 15:13:58,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 460374016. Throughput: 0: 41499.1. Samples: 342573920. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:13:58,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 15:13:59,601][00497] Updated weights for policy 0, policy_version 28101 (0.0028) [2024-03-29 15:14:03,364][00497] Updated weights for policy 0, policy_version 28111 (0.0020) [2024-03-29 15:14:03,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 460570624. Throughput: 0: 41782.6. Samples: 342711420. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:14:03,841][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 15:14:08,085][00497] Updated weights for policy 0, policy_version 28121 (0.0023) [2024-03-29 15:14:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 460767232. Throughput: 0: 41805.9. Samples: 342983780. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:14:08,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:14:11,650][00497] Updated weights for policy 0, policy_version 28131 (0.0038) [2024-03-29 15:14:13,063][00476] Signal inference workers to stop experience collection... (12250 times) [2024-03-29 15:14:13,113][00497] InferenceWorker_p0-w0: stopping experience collection (12250 times) [2024-03-29 15:14:13,246][00476] Signal inference workers to resume experience collection... (12250 times) [2024-03-29 15:14:13,247][00497] InferenceWorker_p0-w0: resuming experience collection (12250 times) [2024-03-29 15:14:13,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 460996608. Throughput: 0: 41468.5. Samples: 343208040. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:14:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 15:14:15,222][00497] Updated weights for policy 0, policy_version 28141 (0.0032) [2024-03-29 15:14:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 461209600. Throughput: 0: 41685.9. Samples: 343339400. Policy #0 lag: (min: 1.0, avg: 23.5, max: 44.0) [2024-03-29 15:14:18,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 15:14:18,955][00497] Updated weights for policy 0, policy_version 28151 (0.0023) [2024-03-29 15:14:23,641][00497] Updated weights for policy 0, policy_version 28161 (0.0026) [2024-03-29 15:14:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 461389824. Throughput: 0: 41813.3. Samples: 343613140. Policy #0 lag: (min: 1.0, avg: 23.5, max: 44.0) [2024-03-29 15:14:23,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:14:26,912][00497] Updated weights for policy 0, policy_version 28171 (0.0027) [2024-03-29 15:14:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 461651968. Throughput: 0: 42176.1. Samples: 343853400. Policy #0 lag: (min: 1.0, avg: 23.5, max: 44.0) [2024-03-29 15:14:28,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 15:14:30,756][00497] Updated weights for policy 0, policy_version 28181 (0.0024) [2024-03-29 15:14:33,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 461848576. Throughput: 0: 41856.1. Samples: 343977860. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:14:33,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:14:34,454][00497] Updated weights for policy 0, policy_version 28191 (0.0025) [2024-03-29 15:14:38,839][00126] Fps is (10 sec: 36045.0, 60 sec: 41233.2, 300 sec: 41931.9). Total num frames: 462012416. Throughput: 0: 41959.6. Samples: 344248220. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:14:38,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 15:14:39,109][00497] Updated weights for policy 0, policy_version 28201 (0.0020) [2024-03-29 15:14:42,512][00497] Updated weights for policy 0, policy_version 28211 (0.0026) [2024-03-29 15:14:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 462258176. Throughput: 0: 42389.3. Samples: 344481440. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:14:43,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 15:14:46,454][00497] Updated weights for policy 0, policy_version 28221 (0.0017) [2024-03-29 15:14:48,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 462471168. Throughput: 0: 42053.4. Samples: 344603820. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:14:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 15:14:50,045][00497] Updated weights for policy 0, policy_version 28231 (0.0022) [2024-03-29 15:14:53,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 462651392. Throughput: 0: 42156.7. Samples: 344880840. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 15:14:53,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:14:54,636][00497] Updated weights for policy 0, policy_version 28241 (0.0027) [2024-03-29 15:14:54,942][00476] Signal inference workers to stop experience collection... (12300 times) [2024-03-29 15:14:54,943][00476] Signal inference workers to resume experience collection... (12300 times) [2024-03-29 15:14:54,989][00497] InferenceWorker_p0-w0: stopping experience collection (12300 times) [2024-03-29 15:14:54,990][00497] InferenceWorker_p0-w0: resuming experience collection (12300 times) [2024-03-29 15:14:57,825][00497] Updated weights for policy 0, policy_version 28251 (0.0023) [2024-03-29 15:14:58,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 462897152. Throughput: 0: 42616.5. Samples: 345125780. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 15:14:58,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 15:15:01,537][00497] Updated weights for policy 0, policy_version 28261 (0.0024) [2024-03-29 15:15:03,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 463110144. Throughput: 0: 42431.5. Samples: 345248820. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 15:15:03,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:15:04,270][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000028268_463142912.pth... [2024-03-29 15:15:04,592][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027649_453001216.pth [2024-03-29 15:15:05,532][00497] Updated weights for policy 0, policy_version 28271 (0.0020) [2024-03-29 15:15:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 463290368. Throughput: 0: 42241.0. Samples: 345513980. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 15:15:08,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 15:15:10,104][00497] Updated weights for policy 0, policy_version 28281 (0.0026) [2024-03-29 15:15:13,507][00497] Updated weights for policy 0, policy_version 28291 (0.0020) [2024-03-29 15:15:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 463536128. Throughput: 0: 42330.1. Samples: 345758260. Policy #0 lag: (min: 2.0, avg: 18.8, max: 42.0) [2024-03-29 15:15:13,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 15:15:17,174][00497] Updated weights for policy 0, policy_version 28301 (0.0023) [2024-03-29 15:15:18,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 463749120. Throughput: 0: 42209.7. Samples: 345877300. Policy #0 lag: (min: 2.0, avg: 18.8, max: 42.0) [2024-03-29 15:15:18,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 15:15:21,287][00497] Updated weights for policy 0, policy_version 28311 (0.0027) [2024-03-29 15:15:23,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 463929344. Throughput: 0: 42220.4. Samples: 346148140. Policy #0 lag: (min: 2.0, avg: 18.8, max: 42.0) [2024-03-29 15:15:23,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 15:15:25,638][00497] Updated weights for policy 0, policy_version 28321 (0.0028) [2024-03-29 15:15:27,908][00476] Signal inference workers to stop experience collection... (12350 times) [2024-03-29 15:15:27,953][00497] InferenceWorker_p0-w0: stopping experience collection (12350 times) [2024-03-29 15:15:28,131][00476] Signal inference workers to resume experience collection... (12350 times) [2024-03-29 15:15:28,132][00497] InferenceWorker_p0-w0: resuming experience collection (12350 times) [2024-03-29 15:15:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 464158720. Throughput: 0: 42543.0. Samples: 346395880. Policy #0 lag: (min: 2.0, avg: 18.8, max: 42.0) [2024-03-29 15:15:28,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:15:28,984][00497] Updated weights for policy 0, policy_version 28331 (0.0030) [2024-03-29 15:15:32,855][00497] Updated weights for policy 0, policy_version 28341 (0.0023) [2024-03-29 15:15:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 464371712. Throughput: 0: 42229.4. Samples: 346504140. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 15:15:33,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 15:15:36,802][00497] Updated weights for policy 0, policy_version 28351 (0.0019) [2024-03-29 15:15:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 464551936. Throughput: 0: 42041.4. Samples: 346772700. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 15:15:38,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 15:15:41,293][00497] Updated weights for policy 0, policy_version 28361 (0.0023) [2024-03-29 15:15:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 464797696. Throughput: 0: 42234.6. Samples: 347026340. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 15:15:43,841][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:15:44,332][00497] Updated weights for policy 0, policy_version 28371 (0.0021) [2024-03-29 15:15:48,333][00497] Updated weights for policy 0, policy_version 28381 (0.0023) [2024-03-29 15:15:48,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 465010688. Throughput: 0: 42098.2. Samples: 347143240. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:15:48,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 15:15:52,111][00497] Updated weights for policy 0, policy_version 28391 (0.0019) [2024-03-29 15:15:53,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 465190912. Throughput: 0: 42091.6. Samples: 347408100. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:15:53,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 15:15:56,465][00497] Updated weights for policy 0, policy_version 28401 (0.0018) [2024-03-29 15:15:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 465436672. Throughput: 0: 42488.6. Samples: 347670240. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:15:58,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 15:15:59,689][00497] Updated weights for policy 0, policy_version 28411 (0.0021) [2024-03-29 15:16:03,744][00497] Updated weights for policy 0, policy_version 28421 (0.0032) [2024-03-29 15:16:03,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 465649664. Throughput: 0: 42397.0. Samples: 347785160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 15:16:03,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 15:16:05,370][00476] Signal inference workers to stop experience collection... (12400 times) [2024-03-29 15:16:05,449][00476] Signal inference workers to resume experience collection... (12400 times) [2024-03-29 15:16:05,451][00497] InferenceWorker_p0-w0: stopping experience collection (12400 times) [2024-03-29 15:16:05,478][00497] InferenceWorker_p0-w0: resuming experience collection (12400 times) [2024-03-29 15:16:07,480][00497] Updated weights for policy 0, policy_version 28431 (0.0028) [2024-03-29 15:16:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 465846272. Throughput: 0: 42395.0. Samples: 348055920. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 15:16:08,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 15:16:08,840][00476] Saving new best policy, reward=0.582! [2024-03-29 15:16:11,910][00497] Updated weights for policy 0, policy_version 28441 (0.0025) [2024-03-29 15:16:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.5, 300 sec: 42154.1). Total num frames: 466075648. Throughput: 0: 42652.2. Samples: 348315220. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 15:16:13,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 15:16:15,024][00497] Updated weights for policy 0, policy_version 28451 (0.0018) [2024-03-29 15:16:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 466288640. Throughput: 0: 42783.9. Samples: 348429420. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 15:16:18,841][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 15:16:19,237][00497] Updated weights for policy 0, policy_version 28461 (0.0022) [2024-03-29 15:16:23,002][00497] Updated weights for policy 0, policy_version 28471 (0.0024) [2024-03-29 15:16:23,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 466468864. Throughput: 0: 42532.4. Samples: 348686660. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 15:16:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 15:16:27,345][00497] Updated weights for policy 0, policy_version 28481 (0.0026) [2024-03-29 15:16:28,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.5, 300 sec: 42209.6). Total num frames: 466698240. Throughput: 0: 42648.1. Samples: 348945500. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 15:16:28,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 15:16:30,626][00497] Updated weights for policy 0, policy_version 28491 (0.0026) [2024-03-29 15:16:33,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42871.4, 300 sec: 42209.6). Total num frames: 466944000. Throughput: 0: 42747.4. Samples: 349066880. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 15:16:33,841][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 15:16:34,701][00497] Updated weights for policy 0, policy_version 28501 (0.0024) [2024-03-29 15:16:37,438][00476] Signal inference workers to stop experience collection... (12450 times) [2024-03-29 15:16:37,478][00497] InferenceWorker_p0-w0: stopping experience collection (12450 times) [2024-03-29 15:16:37,520][00476] Signal inference workers to resume experience collection... (12450 times) [2024-03-29 15:16:37,521][00497] InferenceWorker_p0-w0: resuming experience collection (12450 times) [2024-03-29 15:16:38,427][00497] Updated weights for policy 0, policy_version 28511 (0.0026) [2024-03-29 15:16:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42871.5, 300 sec: 42209.7). Total num frames: 467124224. Throughput: 0: 42492.5. Samples: 349320260. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 15:16:38,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 15:16:42,764][00497] Updated weights for policy 0, policy_version 28521 (0.0027) [2024-03-29 15:16:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 467337216. Throughput: 0: 42477.4. Samples: 349581720. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 15:16:43,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 15:16:45,998][00497] Updated weights for policy 0, policy_version 28531 (0.0024) [2024-03-29 15:16:48,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42871.4, 300 sec: 42265.1). Total num frames: 467582976. Throughput: 0: 42508.8. Samples: 349698060. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:16:48,841][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 15:16:50,122][00497] Updated weights for policy 0, policy_version 28541 (0.0035) [2024-03-29 15:16:53,691][00497] Updated weights for policy 0, policy_version 28551 (0.0026) [2024-03-29 15:16:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 43144.5, 300 sec: 42265.2). Total num frames: 467779584. Throughput: 0: 42249.0. Samples: 349957120. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:16:53,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:16:58,138][00497] Updated weights for policy 0, policy_version 28561 (0.0019) [2024-03-29 15:16:58,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 467976192. Throughput: 0: 42486.1. Samples: 350227100. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:16:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 15:17:01,319][00497] Updated weights for policy 0, policy_version 28571 (0.0019) [2024-03-29 15:17:03,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 468221952. Throughput: 0: 42429.1. Samples: 350338720. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:17:03,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 15:17:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000028578_468221952.pth... [2024-03-29 15:17:04,182][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000027957_458047488.pth [2024-03-29 15:17:05,622][00497] Updated weights for policy 0, policy_version 28581 (0.0027) [2024-03-29 15:17:08,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 468418560. Throughput: 0: 42394.7. Samples: 350594420. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 15:17:08,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 15:17:09,220][00497] Updated weights for policy 0, policy_version 28591 (0.0019) [2024-03-29 15:17:11,970][00476] Signal inference workers to stop experience collection... (12500 times) [2024-03-29 15:17:12,045][00476] Signal inference workers to resume experience collection... (12500 times) [2024-03-29 15:17:12,048][00497] InferenceWorker_p0-w0: stopping experience collection (12500 times) [2024-03-29 15:17:12,073][00497] InferenceWorker_p0-w0: resuming experience collection (12500 times) [2024-03-29 15:17:13,564][00497] Updated weights for policy 0, policy_version 28601 (0.0026) [2024-03-29 15:17:13,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 468615168. Throughput: 0: 42682.7. Samples: 350866220. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 15:17:13,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 15:17:16,794][00497] Updated weights for policy 0, policy_version 28611 (0.0027) [2024-03-29 15:17:18,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42871.5, 300 sec: 42320.7). Total num frames: 468860928. Throughput: 0: 42406.3. Samples: 350975160. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 15:17:18,842][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 15:17:21,061][00497] Updated weights for policy 0, policy_version 28621 (0.0018) [2024-03-29 15:17:23,839][00126] Fps is (10 sec: 44235.8, 60 sec: 43144.5, 300 sec: 42265.2). Total num frames: 469057536. Throughput: 0: 42423.4. Samples: 351229320. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 15:17:23,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 15:17:24,782][00497] Updated weights for policy 0, policy_version 28631 (0.0019) [2024-03-29 15:17:28,839][00126] Fps is (10 sec: 37683.1, 60 sec: 42325.2, 300 sec: 42209.6). Total num frames: 469237760. Throughput: 0: 42531.0. Samples: 351495620. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 15:17:28,841][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:17:29,085][00497] Updated weights for policy 0, policy_version 28641 (0.0025) [2024-03-29 15:17:32,773][00497] Updated weights for policy 0, policy_version 28651 (0.0024) [2024-03-29 15:17:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 469467136. Throughput: 0: 42605.8. Samples: 351615320. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 15:17:33,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 15:17:36,850][00497] Updated weights for policy 0, policy_version 28661 (0.0018) [2024-03-29 15:17:38,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 469680128. Throughput: 0: 42213.8. Samples: 351856740. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 15:17:38,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:17:40,443][00497] Updated weights for policy 0, policy_version 28671 (0.0024) [2024-03-29 15:17:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 469860352. Throughput: 0: 42212.5. Samples: 352126660. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 15:17:43,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 15:17:44,719][00476] Signal inference workers to stop experience collection... (12550 times) [2024-03-29 15:17:44,799][00476] Signal inference workers to resume experience collection... (12550 times) [2024-03-29 15:17:44,800][00497] InferenceWorker_p0-w0: stopping experience collection (12550 times) [2024-03-29 15:17:44,807][00497] Updated weights for policy 0, policy_version 28681 (0.0018) [2024-03-29 15:17:44,827][00497] InferenceWorker_p0-w0: resuming experience collection (12550 times) [2024-03-29 15:17:48,310][00497] Updated weights for policy 0, policy_version 28691 (0.0030) [2024-03-29 15:17:48,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 470106112. Throughput: 0: 42256.8. Samples: 352240280. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 15:17:48,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 15:17:52,087][00497] Updated weights for policy 0, policy_version 28701 (0.0022) [2024-03-29 15:17:53,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.2, 300 sec: 42265.2). Total num frames: 470319104. Throughput: 0: 42240.0. Samples: 352495220. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 15:17:53,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:17:55,674][00497] Updated weights for policy 0, policy_version 28711 (0.0022) [2024-03-29 15:17:58,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 470499328. Throughput: 0: 42267.0. Samples: 352768240. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 15:17:58,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 15:18:00,188][00497] Updated weights for policy 0, policy_version 28721 (0.0040) [2024-03-29 15:18:03,539][00497] Updated weights for policy 0, policy_version 28731 (0.0018) [2024-03-29 15:18:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.1, 300 sec: 42320.7). Total num frames: 470745088. Throughput: 0: 42392.4. Samples: 352882820. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:18:03,840][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 15:18:07,549][00497] Updated weights for policy 0, policy_version 28741 (0.0032) [2024-03-29 15:18:08,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.5, 300 sec: 42265.2). Total num frames: 470958080. Throughput: 0: 42278.9. Samples: 353131860. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:18:08,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:18:11,187][00497] Updated weights for policy 0, policy_version 28751 (0.0019) [2024-03-29 15:18:13,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 471138304. Throughput: 0: 42360.1. Samples: 353401820. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:18:13,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 15:18:15,645][00497] Updated weights for policy 0, policy_version 28761 (0.0025) [2024-03-29 15:18:15,701][00476] Signal inference workers to stop experience collection... (12600 times) [2024-03-29 15:18:15,733][00497] InferenceWorker_p0-w0: stopping experience collection (12600 times) [2024-03-29 15:18:15,877][00476] Signal inference workers to resume experience collection... (12600 times) [2024-03-29 15:18:15,878][00497] InferenceWorker_p0-w0: resuming experience collection (12600 times) [2024-03-29 15:18:18,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 471367680. Throughput: 0: 42313.4. Samples: 353519420. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:18:18,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:18:19,043][00497] Updated weights for policy 0, policy_version 28771 (0.0022) [2024-03-29 15:18:23,359][00497] Updated weights for policy 0, policy_version 28781 (0.0023) [2024-03-29 15:18:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 471564288. Throughput: 0: 42327.1. Samples: 353761460. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 15:18:23,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:18:26,960][00497] Updated weights for policy 0, policy_version 28791 (0.0022) [2024-03-29 15:18:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 471760896. Throughput: 0: 42090.6. Samples: 354020740. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 15:18:28,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 15:18:31,232][00497] Updated weights for policy 0, policy_version 28801 (0.0023) [2024-03-29 15:18:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 471990272. Throughput: 0: 42491.2. Samples: 354152380. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 15:18:33,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 15:18:34,573][00497] Updated weights for policy 0, policy_version 28811 (0.0023) [2024-03-29 15:18:38,769][00497] Updated weights for policy 0, policy_version 28821 (0.0023) [2024-03-29 15:18:38,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 472203264. Throughput: 0: 42045.9. Samples: 354387280. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 15:18:38,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:18:42,637][00497] Updated weights for policy 0, policy_version 28831 (0.0026) [2024-03-29 15:18:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 472399872. Throughput: 0: 41850.7. Samples: 354651520. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:18:43,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 15:18:46,716][00497] Updated weights for policy 0, policy_version 28841 (0.0024) [2024-03-29 15:18:48,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 472629248. Throughput: 0: 42253.4. Samples: 354784220. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:18:48,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 15:18:50,000][00476] Signal inference workers to stop experience collection... (12650 times) [2024-03-29 15:18:50,055][00497] InferenceWorker_p0-w0: stopping experience collection (12650 times) [2024-03-29 15:18:50,097][00476] Signal inference workers to resume experience collection... (12650 times) [2024-03-29 15:18:50,099][00497] InferenceWorker_p0-w0: resuming experience collection (12650 times) [2024-03-29 15:18:50,103][00497] Updated weights for policy 0, policy_version 28851 (0.0023) [2024-03-29 15:18:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 472842240. Throughput: 0: 41856.3. Samples: 355015400. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:18:53,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 15:18:54,252][00497] Updated weights for policy 0, policy_version 28861 (0.0018) [2024-03-29 15:18:57,938][00497] Updated weights for policy 0, policy_version 28871 (0.0018) [2024-03-29 15:18:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 473038848. Throughput: 0: 41873.7. Samples: 355286140. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 15:18:58,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 15:19:02,381][00497] Updated weights for policy 0, policy_version 28881 (0.0024) [2024-03-29 15:19:03,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42265.2). Total num frames: 473235456. Throughput: 0: 42097.8. Samples: 355413820. Policy #0 lag: (min: 0.0, avg: 18.8, max: 41.0) [2024-03-29 15:19:03,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 15:19:04,144][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000028886_473268224.pth... [2024-03-29 15:19:04,438][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000028268_463142912.pth [2024-03-29 15:19:05,900][00497] Updated weights for policy 0, policy_version 28891 (0.0021) [2024-03-29 15:19:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.1, 300 sec: 42265.2). Total num frames: 473464832. Throughput: 0: 41815.1. Samples: 355643140. Policy #0 lag: (min: 0.0, avg: 18.8, max: 41.0) [2024-03-29 15:19:08,841][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 15:19:10,226][00497] Updated weights for policy 0, policy_version 28901 (0.0028) [2024-03-29 15:19:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 473661440. Throughput: 0: 41491.6. Samples: 355887860. Policy #0 lag: (min: 0.0, avg: 18.8, max: 41.0) [2024-03-29 15:19:13,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 15:19:14,078][00497] Updated weights for policy 0, policy_version 28911 (0.0025) [2024-03-29 15:19:18,188][00497] Updated weights for policy 0, policy_version 28921 (0.0021) [2024-03-29 15:19:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42265.2). Total num frames: 473858048. Throughput: 0: 41612.4. Samples: 356024940. Policy #0 lag: (min: 0.0, avg: 18.8, max: 41.0) [2024-03-29 15:19:18,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 15:19:21,757][00497] Updated weights for policy 0, policy_version 28931 (0.0030) [2024-03-29 15:19:21,827][00476] Signal inference workers to stop experience collection... (12700 times) [2024-03-29 15:19:21,853][00497] InferenceWorker_p0-w0: stopping experience collection (12700 times) [2024-03-29 15:19:22,006][00476] Signal inference workers to resume experience collection... (12700 times) [2024-03-29 15:19:22,007][00497] InferenceWorker_p0-w0: resuming experience collection (12700 times) [2024-03-29 15:19:23,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 474120192. Throughput: 0: 41839.1. Samples: 356270040. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 15:19:23,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:19:26,022][00497] Updated weights for policy 0, policy_version 28941 (0.0020) [2024-03-29 15:19:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.5, 300 sec: 42209.6). Total num frames: 474300416. Throughput: 0: 41250.7. Samples: 356507800. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 15:19:28,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 15:19:29,897][00497] Updated weights for policy 0, policy_version 28951 (0.0022) [2024-03-29 15:19:33,839][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42265.2). Total num frames: 474480640. Throughput: 0: 41632.0. Samples: 356657660. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 15:19:33,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:19:33,896][00497] Updated weights for policy 0, policy_version 28961 (0.0021) [2024-03-29 15:19:37,470][00497] Updated weights for policy 0, policy_version 28971 (0.0018) [2024-03-29 15:19:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 474726400. Throughput: 0: 41990.3. Samples: 356904960. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 15:19:38,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 15:19:41,688][00497] Updated weights for policy 0, policy_version 28981 (0.0023) [2024-03-29 15:19:43,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 474923008. Throughput: 0: 41158.2. Samples: 357138260. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 15:19:43,842][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 15:19:45,756][00497] Updated weights for policy 0, policy_version 28991 (0.0023) [2024-03-29 15:19:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42265.2). Total num frames: 475119616. Throughput: 0: 41371.1. Samples: 357275520. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 15:19:48,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 15:19:49,458][00497] Updated weights for policy 0, policy_version 29001 (0.0022) [2024-03-29 15:19:53,000][00476] Signal inference workers to stop experience collection... (12750 times) [2024-03-29 15:19:53,023][00497] InferenceWorker_p0-w0: stopping experience collection (12750 times) [2024-03-29 15:19:53,212][00476] Signal inference workers to resume experience collection... (12750 times) [2024-03-29 15:19:53,213][00497] InferenceWorker_p0-w0: resuming experience collection (12750 times) [2024-03-29 15:19:53,216][00497] Updated weights for policy 0, policy_version 29011 (0.0024) [2024-03-29 15:19:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 475348992. Throughput: 0: 41844.8. Samples: 357526160. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 15:19:53,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:19:57,366][00497] Updated weights for policy 0, policy_version 29021 (0.0028) [2024-03-29 15:19:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 475545600. Throughput: 0: 41749.0. Samples: 357766560. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 15:19:58,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 15:20:01,447][00497] Updated weights for policy 0, policy_version 29031 (0.0023) [2024-03-29 15:20:03,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 475725824. Throughput: 0: 41604.4. Samples: 357897140. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 15:20:03,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 15:20:04,986][00497] Updated weights for policy 0, policy_version 29041 (0.0027) [2024-03-29 15:20:08,740][00497] Updated weights for policy 0, policy_version 29051 (0.0033) [2024-03-29 15:20:08,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 475971584. Throughput: 0: 41984.3. Samples: 358159340. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 15:20:08,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 15:20:12,705][00497] Updated weights for policy 0, policy_version 29061 (0.0024) [2024-03-29 15:20:13,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 476200960. Throughput: 0: 42302.0. Samples: 358411400. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 15:20:13,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 15:20:16,724][00497] Updated weights for policy 0, policy_version 29071 (0.0024) [2024-03-29 15:20:18,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 476364800. Throughput: 0: 41662.7. Samples: 358532480. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 15:20:18,841][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:20:20,532][00497] Updated weights for policy 0, policy_version 29081 (0.0022) [2024-03-29 15:20:23,308][00476] Signal inference workers to stop experience collection... (12800 times) [2024-03-29 15:20:23,310][00476] Signal inference workers to resume experience collection... (12800 times) [2024-03-29 15:20:23,350][00497] InferenceWorker_p0-w0: stopping experience collection (12800 times) [2024-03-29 15:20:23,350][00497] InferenceWorker_p0-w0: resuming experience collection (12800 times) [2024-03-29 15:20:23,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 476594176. Throughput: 0: 42084.0. Samples: 358798740. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 15:20:23,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 15:20:24,371][00497] Updated weights for policy 0, policy_version 29091 (0.0021) [2024-03-29 15:20:28,458][00497] Updated weights for policy 0, policy_version 29101 (0.0019) [2024-03-29 15:20:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 476807168. Throughput: 0: 42273.9. Samples: 359040580. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 15:20:28,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 15:20:32,325][00497] Updated weights for policy 0, policy_version 29111 (0.0026) [2024-03-29 15:20:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 476987392. Throughput: 0: 41665.4. Samples: 359150460. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 15:20:33,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:20:36,289][00497] Updated weights for policy 0, policy_version 29121 (0.0021) [2024-03-29 15:20:38,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.0, 300 sec: 42098.5). Total num frames: 477216768. Throughput: 0: 41866.7. Samples: 359410160. Policy #0 lag: (min: 1.0, avg: 20.1, max: 43.0) [2024-03-29 15:20:38,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:20:40,162][00497] Updated weights for policy 0, policy_version 29131 (0.0033) [2024-03-29 15:20:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 477429760. Throughput: 0: 41917.2. Samples: 359652840. Policy #0 lag: (min: 1.0, avg: 20.1, max: 43.0) [2024-03-29 15:20:43,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 15:20:44,029][00497] Updated weights for policy 0, policy_version 29141 (0.0028) [2024-03-29 15:20:47,993][00497] Updated weights for policy 0, policy_version 29151 (0.0019) [2024-03-29 15:20:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 477626368. Throughput: 0: 41840.8. Samples: 359779980. Policy #0 lag: (min: 1.0, avg: 20.1, max: 43.0) [2024-03-29 15:20:48,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 15:20:51,894][00497] Updated weights for policy 0, policy_version 29161 (0.0021) [2024-03-29 15:20:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 477839360. Throughput: 0: 41644.5. Samples: 360033340. Policy #0 lag: (min: 1.0, avg: 20.1, max: 43.0) [2024-03-29 15:20:53,842][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 15:20:55,370][00476] Signal inference workers to stop experience collection... (12850 times) [2024-03-29 15:20:55,370][00476] Signal inference workers to resume experience collection... (12850 times) [2024-03-29 15:20:55,418][00497] InferenceWorker_p0-w0: stopping experience collection (12850 times) [2024-03-29 15:20:55,418][00497] InferenceWorker_p0-w0: resuming experience collection (12850 times) [2024-03-29 15:20:55,678][00497] Updated weights for policy 0, policy_version 29171 (0.0027) [2024-03-29 15:20:58,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 478068736. Throughput: 0: 41582.8. Samples: 360282620. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:20:58,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 15:20:59,436][00497] Updated weights for policy 0, policy_version 29181 (0.0024) [2024-03-29 15:21:03,050][00497] Updated weights for policy 0, policy_version 29191 (0.0018) [2024-03-29 15:21:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 478281728. Throughput: 0: 41902.2. Samples: 360418080. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:21:03,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 15:21:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000029192_478281728.pth... [2024-03-29 15:21:04,172][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000028578_468221952.pth [2024-03-29 15:21:07,258][00497] Updated weights for policy 0, policy_version 29201 (0.0029) [2024-03-29 15:21:08,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 478478336. Throughput: 0: 41829.2. Samples: 360681060. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:21:08,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 15:21:11,162][00497] Updated weights for policy 0, policy_version 29211 (0.0019) [2024-03-29 15:21:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 478707712. Throughput: 0: 41848.3. Samples: 360923760. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:21:13,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 15:21:14,943][00497] Updated weights for policy 0, policy_version 29221 (0.0019) [2024-03-29 15:21:18,520][00497] Updated weights for policy 0, policy_version 29231 (0.0022) [2024-03-29 15:21:18,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 478937088. Throughput: 0: 42446.6. Samples: 361060560. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:21:18,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 15:21:22,496][00497] Updated weights for policy 0, policy_version 29241 (0.0021) [2024-03-29 15:21:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 479117312. Throughput: 0: 42590.7. Samples: 361326740. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:21:23,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 15:21:25,349][00476] Signal inference workers to stop experience collection... (12900 times) [2024-03-29 15:21:25,416][00497] InferenceWorker_p0-w0: stopping experience collection (12900 times) [2024-03-29 15:21:25,424][00476] Signal inference workers to resume experience collection... (12900 times) [2024-03-29 15:21:25,445][00497] InferenceWorker_p0-w0: resuming experience collection (12900 times) [2024-03-29 15:21:26,399][00497] Updated weights for policy 0, policy_version 29251 (0.0020) [2024-03-29 15:21:28,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 479363072. Throughput: 0: 42582.3. Samples: 361569040. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:21:28,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 15:21:30,136][00497] Updated weights for policy 0, policy_version 29261 (0.0022) [2024-03-29 15:21:33,840][00126] Fps is (10 sec: 44232.9, 60 sec: 42870.8, 300 sec: 42153.9). Total num frames: 479559680. Throughput: 0: 42603.2. Samples: 361697160. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:21:33,841][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:21:33,870][00497] Updated weights for policy 0, policy_version 29271 (0.0017) [2024-03-29 15:21:38,030][00497] Updated weights for policy 0, policy_version 29281 (0.0023) [2024-03-29 15:21:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 479756288. Throughput: 0: 42935.2. Samples: 361965420. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 15:21:38,840][00126] Avg episode reward: [(0, '0.292')] [2024-03-29 15:21:41,984][00497] Updated weights for policy 0, policy_version 29291 (0.0019) [2024-03-29 15:21:43,839][00126] Fps is (10 sec: 44240.5, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 480002048. Throughput: 0: 42690.1. Samples: 362203680. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 15:21:43,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 15:21:45,672][00497] Updated weights for policy 0, policy_version 29301 (0.0027) [2024-03-29 15:21:48,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.6, 300 sec: 42098.5). Total num frames: 480198656. Throughput: 0: 42434.2. Samples: 362327620. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 15:21:48,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 15:21:49,400][00497] Updated weights for policy 0, policy_version 29311 (0.0023) [2024-03-29 15:21:53,610][00497] Updated weights for policy 0, policy_version 29321 (0.0022) [2024-03-29 15:21:53,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 480395264. Throughput: 0: 42667.3. Samples: 362601080. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 15:21:53,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:21:57,466][00497] Updated weights for policy 0, policy_version 29331 (0.0033) [2024-03-29 15:21:58,254][00476] Signal inference workers to stop experience collection... (12950 times) [2024-03-29 15:21:58,296][00497] InferenceWorker_p0-w0: stopping experience collection (12950 times) [2024-03-29 15:21:58,416][00476] Signal inference workers to resume experience collection... (12950 times) [2024-03-29 15:21:58,416][00497] InferenceWorker_p0-w0: resuming experience collection (12950 times) [2024-03-29 15:21:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 480624640. Throughput: 0: 42658.3. Samples: 362843380. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:21:58,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 15:22:01,218][00497] Updated weights for policy 0, policy_version 29341 (0.0021) [2024-03-29 15:22:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 42098.6). Total num frames: 480837632. Throughput: 0: 42307.6. Samples: 362964400. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:22:03,842][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:22:04,899][00497] Updated weights for policy 0, policy_version 29351 (0.0027) [2024-03-29 15:22:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.5, 300 sec: 42098.5). Total num frames: 481034240. Throughput: 0: 42522.8. Samples: 363240260. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:22:08,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 15:22:08,889][00497] Updated weights for policy 0, policy_version 29361 (0.0027) [2024-03-29 15:22:12,905][00497] Updated weights for policy 0, policy_version 29371 (0.0026) [2024-03-29 15:22:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 481263616. Throughput: 0: 42401.2. Samples: 363477100. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:22:13,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 15:22:16,720][00497] Updated weights for policy 0, policy_version 29381 (0.0026) [2024-03-29 15:22:18,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 481476608. Throughput: 0: 42314.1. Samples: 363601260. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:22:18,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 15:22:20,341][00497] Updated weights for policy 0, policy_version 29391 (0.0020) [2024-03-29 15:22:23,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 481673216. Throughput: 0: 42408.4. Samples: 363873800. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:22:23,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:22:24,379][00497] Updated weights for policy 0, policy_version 29401 (0.0021) [2024-03-29 15:22:28,297][00497] Updated weights for policy 0, policy_version 29411 (0.0022) [2024-03-29 15:22:28,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 481902592. Throughput: 0: 42680.2. Samples: 364124280. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:22:28,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 15:22:31,786][00476] Signal inference workers to stop experience collection... (13000 times) [2024-03-29 15:22:31,856][00497] InferenceWorker_p0-w0: stopping experience collection (13000 times) [2024-03-29 15:22:31,863][00476] Signal inference workers to resume experience collection... (13000 times) [2024-03-29 15:22:31,881][00497] InferenceWorker_p0-w0: resuming experience collection (13000 times) [2024-03-29 15:22:31,884][00497] Updated weights for policy 0, policy_version 29421 (0.0027) [2024-03-29 15:22:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42599.0, 300 sec: 42154.1). Total num frames: 482115584. Throughput: 0: 42691.9. Samples: 364248760. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:22:33,842][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:22:35,422][00497] Updated weights for policy 0, policy_version 29431 (0.0018) [2024-03-29 15:22:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 482312192. Throughput: 0: 42575.1. Samples: 364516960. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 15:22:38,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:22:39,679][00497] Updated weights for policy 0, policy_version 29441 (0.0024) [2024-03-29 15:22:43,534][00497] Updated weights for policy 0, policy_version 29451 (0.0035) [2024-03-29 15:22:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 482525184. Throughput: 0: 42701.7. Samples: 364764960. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 15:22:43,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 15:22:47,433][00497] Updated weights for policy 0, policy_version 29461 (0.0022) [2024-03-29 15:22:48,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 482754560. Throughput: 0: 42492.9. Samples: 364876580. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 15:22:48,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 15:22:50,906][00497] Updated weights for policy 0, policy_version 29471 (0.0017) [2024-03-29 15:22:53,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 482934784. Throughput: 0: 42470.1. Samples: 365151420. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 15:22:53,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 15:22:55,143][00497] Updated weights for policy 0, policy_version 29481 (0.0020) [2024-03-29 15:22:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 483164160. Throughput: 0: 42684.6. Samples: 365397900. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 15:22:58,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 15:22:59,333][00497] Updated weights for policy 0, policy_version 29491 (0.0020) [2024-03-29 15:23:03,041][00497] Updated weights for policy 0, policy_version 29501 (0.0025) [2024-03-29 15:23:03,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 483377152. Throughput: 0: 42289.9. Samples: 365504300. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 15:23:03,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 15:23:03,964][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000029504_483393536.pth... [2024-03-29 15:23:04,282][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000028886_473268224.pth [2024-03-29 15:23:05,758][00476] Signal inference workers to stop experience collection... (13050 times) [2024-03-29 15:23:05,807][00497] InferenceWorker_p0-w0: stopping experience collection (13050 times) [2024-03-29 15:23:05,827][00476] Signal inference workers to resume experience collection... (13050 times) [2024-03-29 15:23:05,844][00497] InferenceWorker_p0-w0: resuming experience collection (13050 times) [2024-03-29 15:23:06,415][00497] Updated weights for policy 0, policy_version 29511 (0.0031) [2024-03-29 15:23:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 483557376. Throughput: 0: 42026.7. Samples: 365765000. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 15:23:08,841][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 15:23:10,892][00497] Updated weights for policy 0, policy_version 29521 (0.0020) [2024-03-29 15:23:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 483786752. Throughput: 0: 42227.9. Samples: 366024540. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 15:23:13,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 15:23:15,088][00497] Updated weights for policy 0, policy_version 29531 (0.0022) [2024-03-29 15:23:18,440][00497] Updated weights for policy 0, policy_version 29541 (0.0028) [2024-03-29 15:23:18,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 484016128. Throughput: 0: 42005.8. Samples: 366139020. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 15:23:18,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 15:23:22,175][00497] Updated weights for policy 0, policy_version 29551 (0.0023) [2024-03-29 15:23:23,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 484196352. Throughput: 0: 41809.3. Samples: 366398380. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 15:23:23,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 15:23:26,375][00497] Updated weights for policy 0, policy_version 29561 (0.0021) [2024-03-29 15:23:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 484409344. Throughput: 0: 41963.2. Samples: 366653300. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 15:23:28,841][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 15:23:30,708][00497] Updated weights for policy 0, policy_version 29571 (0.0021) [2024-03-29 15:23:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 484638720. Throughput: 0: 42240.5. Samples: 366777400. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:23:33,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:23:34,053][00497] Updated weights for policy 0, policy_version 29581 (0.0018) [2024-03-29 15:23:37,796][00497] Updated weights for policy 0, policy_version 29591 (0.0019) [2024-03-29 15:23:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 484835328. Throughput: 0: 41646.7. Samples: 367025520. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:23:38,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 15:23:40,317][00476] Signal inference workers to stop experience collection... (13100 times) [2024-03-29 15:23:40,340][00497] InferenceWorker_p0-w0: stopping experience collection (13100 times) [2024-03-29 15:23:40,535][00476] Signal inference workers to resume experience collection... (13100 times) [2024-03-29 15:23:40,536][00497] InferenceWorker_p0-w0: resuming experience collection (13100 times) [2024-03-29 15:23:42,574][00497] Updated weights for policy 0, policy_version 29602 (0.0031) [2024-03-29 15:23:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 485031936. Throughput: 0: 41380.0. Samples: 367260000. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:23:43,841][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:23:46,920][00497] Updated weights for policy 0, policy_version 29612 (0.0020) [2024-03-29 15:23:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 485261312. Throughput: 0: 41952.4. Samples: 367392160. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 15:23:48,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:23:50,167][00497] Updated weights for policy 0, policy_version 29622 (0.0026) [2024-03-29 15:23:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 485457920. Throughput: 0: 41684.3. Samples: 367640800. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:23:53,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:23:54,192][00497] Updated weights for policy 0, policy_version 29632 (0.0023) [2024-03-29 15:23:58,525][00497] Updated weights for policy 0, policy_version 29642 (0.0035) [2024-03-29 15:23:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 485670912. Throughput: 0: 41306.6. Samples: 367883340. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:23:58,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 15:24:03,023][00497] Updated weights for policy 0, policy_version 29652 (0.0028) [2024-03-29 15:24:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 485867520. Throughput: 0: 41673.3. Samples: 368014320. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:24:03,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 15:24:06,164][00497] Updated weights for policy 0, policy_version 29662 (0.0028) [2024-03-29 15:24:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 486080512. Throughput: 0: 41425.8. Samples: 368262540. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 15:24:08,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 15:24:10,012][00497] Updated weights for policy 0, policy_version 29672 (0.0023) [2024-03-29 15:24:12,102][00476] Signal inference workers to stop experience collection... (13150 times) [2024-03-29 15:24:12,112][00476] Signal inference workers to resume experience collection... (13150 times) [2024-03-29 15:24:12,128][00497] InferenceWorker_p0-w0: stopping experience collection (13150 times) [2024-03-29 15:24:12,153][00497] InferenceWorker_p0-w0: resuming experience collection (13150 times) [2024-03-29 15:24:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 486293504. Throughput: 0: 41472.9. Samples: 368519580. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 15:24:13,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 15:24:14,074][00497] Updated weights for policy 0, policy_version 29682 (0.0025) [2024-03-29 15:24:18,386][00497] Updated weights for policy 0, policy_version 29692 (0.0026) [2024-03-29 15:24:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.2, 300 sec: 41931.9). Total num frames: 486490112. Throughput: 0: 41478.7. Samples: 368643940. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 15:24:18,840][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 15:24:21,547][00497] Updated weights for policy 0, policy_version 29702 (0.0024) [2024-03-29 15:24:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 486719488. Throughput: 0: 41564.9. Samples: 368895940. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 15:24:23,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:24:25,733][00497] Updated weights for policy 0, policy_version 29712 (0.0023) [2024-03-29 15:24:28,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 486916096. Throughput: 0: 42251.1. Samples: 369161300. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 15:24:28,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 15:24:29,496][00497] Updated weights for policy 0, policy_version 29722 (0.0028) [2024-03-29 15:24:33,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 487112704. Throughput: 0: 41840.0. Samples: 369274960. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:24:33,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 15:24:33,902][00497] Updated weights for policy 0, policy_version 29732 (0.0022) [2024-03-29 15:24:36,952][00497] Updated weights for policy 0, policy_version 29742 (0.0019) [2024-03-29 15:24:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 42098.6). Total num frames: 487342080. Throughput: 0: 41849.9. Samples: 369524040. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:24:38,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 15:24:41,199][00497] Updated weights for policy 0, policy_version 29752 (0.0019) [2024-03-29 15:24:43,455][00476] Signal inference workers to stop experience collection... (13200 times) [2024-03-29 15:24:43,510][00497] InferenceWorker_p0-w0: stopping experience collection (13200 times) [2024-03-29 15:24:43,640][00476] Signal inference workers to resume experience collection... (13200 times) [2024-03-29 15:24:43,640][00497] InferenceWorker_p0-w0: resuming experience collection (13200 times) [2024-03-29 15:24:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 487538688. Throughput: 0: 42544.9. Samples: 369797860. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:24:43,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 15:24:45,108][00497] Updated weights for policy 0, policy_version 29762 (0.0027) [2024-03-29 15:24:48,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41506.0, 300 sec: 42043.0). Total num frames: 487751680. Throughput: 0: 41880.4. Samples: 369898940. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:24:48,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 15:24:49,655][00497] Updated weights for policy 0, policy_version 29772 (0.0019) [2024-03-29 15:24:52,901][00497] Updated weights for policy 0, policy_version 29782 (0.0023) [2024-03-29 15:24:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 487964672. Throughput: 0: 41879.1. Samples: 370147100. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 15:24:53,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 15:24:56,790][00497] Updated weights for policy 0, policy_version 29792 (0.0022) [2024-03-29 15:24:58,839][00126] Fps is (10 sec: 39322.4, 60 sec: 41233.1, 300 sec: 42098.6). Total num frames: 488144896. Throughput: 0: 42362.4. Samples: 370425880. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 15:24:58,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 15:25:00,826][00497] Updated weights for policy 0, policy_version 29802 (0.0020) [2024-03-29 15:25:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 488390656. Throughput: 0: 41909.3. Samples: 370529860. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 15:25:03,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 15:25:04,005][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000029810_488407040.pth... [2024-03-29 15:25:04,323][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000029192_478281728.pth [2024-03-29 15:25:05,194][00497] Updated weights for policy 0, policy_version 29812 (0.0020) [2024-03-29 15:25:08,425][00497] Updated weights for policy 0, policy_version 29822 (0.0020) [2024-03-29 15:25:08,839][00126] Fps is (10 sec: 47513.0, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 488620032. Throughput: 0: 41913.8. Samples: 370782060. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 15:25:08,840][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 15:25:12,357][00497] Updated weights for policy 0, policy_version 29832 (0.0023) [2024-03-29 15:25:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42098.5). Total num frames: 488783872. Throughput: 0: 41884.9. Samples: 371046120. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 15:25:13,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:25:14,245][00476] Signal inference workers to stop experience collection... (13250 times) [2024-03-29 15:25:14,313][00497] InferenceWorker_p0-w0: stopping experience collection (13250 times) [2024-03-29 15:25:14,331][00476] Signal inference workers to resume experience collection... (13250 times) [2024-03-29 15:25:14,344][00497] InferenceWorker_p0-w0: resuming experience collection (13250 times) [2024-03-29 15:25:16,315][00497] Updated weights for policy 0, policy_version 29842 (0.0027) [2024-03-29 15:25:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 489029632. Throughput: 0: 41976.4. Samples: 371163900. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 15:25:18,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 15:25:20,629][00497] Updated weights for policy 0, policy_version 29852 (0.0020) [2024-03-29 15:25:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 489242624. Throughput: 0: 42224.4. Samples: 371424140. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 15:25:23,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 15:25:23,927][00497] Updated weights for policy 0, policy_version 29862 (0.0019) [2024-03-29 15:25:28,434][00497] Updated weights for policy 0, policy_version 29872 (0.0028) [2024-03-29 15:25:28,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 489422848. Throughput: 0: 41550.6. Samples: 371667640. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 15:25:28,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 15:25:32,101][00497] Updated weights for policy 0, policy_version 29882 (0.0028) [2024-03-29 15:25:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 489635840. Throughput: 0: 42167.7. Samples: 371796480. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 15:25:33,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 15:25:36,343][00497] Updated weights for policy 0, policy_version 29892 (0.0031) [2024-03-29 15:25:38,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 489881600. Throughput: 0: 42317.4. Samples: 372051380. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 15:25:38,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 15:25:39,434][00497] Updated weights for policy 0, policy_version 29902 (0.0021) [2024-03-29 15:25:43,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 490061824. Throughput: 0: 41502.1. Samples: 372293480. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 15:25:43,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 15:25:43,966][00497] Updated weights for policy 0, policy_version 29912 (0.0026) [2024-03-29 15:25:45,928][00476] Signal inference workers to stop experience collection... (13300 times) [2024-03-29 15:25:45,985][00497] InferenceWorker_p0-w0: stopping experience collection (13300 times) [2024-03-29 15:25:46,115][00476] Signal inference workers to resume experience collection... (13300 times) [2024-03-29 15:25:46,115][00497] InferenceWorker_p0-w0: resuming experience collection (13300 times) [2024-03-29 15:25:47,706][00497] Updated weights for policy 0, policy_version 29922 (0.0019) [2024-03-29 15:25:48,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 490291200. Throughput: 0: 42176.8. Samples: 372427820. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 15:25:48,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 15:25:52,209][00497] Updated weights for policy 0, policy_version 29932 (0.0034) [2024-03-29 15:25:53,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 490487808. Throughput: 0: 42221.9. Samples: 372682040. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:25:53,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:25:55,079][00497] Updated weights for policy 0, policy_version 29942 (0.0023) [2024-03-29 15:25:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 490700800. Throughput: 0: 41919.2. Samples: 372932480. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:25:58,841][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:25:59,425][00497] Updated weights for policy 0, policy_version 29952 (0.0027) [2024-03-29 15:26:03,231][00497] Updated weights for policy 0, policy_version 29962 (0.0020) [2024-03-29 15:26:03,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 490913792. Throughput: 0: 42305.2. Samples: 373067640. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:26:03,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 15:26:07,544][00497] Updated weights for policy 0, policy_version 29972 (0.0018) [2024-03-29 15:26:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 42098.6). Total num frames: 491126784. Throughput: 0: 42142.3. Samples: 373320540. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:26:08,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:26:10,597][00497] Updated weights for policy 0, policy_version 29982 (0.0018) [2024-03-29 15:26:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 491323392. Throughput: 0: 41982.2. Samples: 373556840. Policy #0 lag: (min: 2.0, avg: 23.2, max: 42.0) [2024-03-29 15:26:13,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 15:26:15,166][00497] Updated weights for policy 0, policy_version 29992 (0.0022) [2024-03-29 15:26:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 491536384. Throughput: 0: 42248.4. Samples: 373697660. Policy #0 lag: (min: 2.0, avg: 23.2, max: 42.0) [2024-03-29 15:26:18,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 15:26:18,963][00497] Updated weights for policy 0, policy_version 30002 (0.0026) [2024-03-29 15:26:21,065][00476] Signal inference workers to stop experience collection... (13350 times) [2024-03-29 15:26:21,145][00476] Signal inference workers to resume experience collection... (13350 times) [2024-03-29 15:26:21,147][00497] InferenceWorker_p0-w0: stopping experience collection (13350 times) [2024-03-29 15:26:21,172][00497] InferenceWorker_p0-w0: resuming experience collection (13350 times) [2024-03-29 15:26:23,209][00497] Updated weights for policy 0, policy_version 30012 (0.0023) [2024-03-29 15:26:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 491749376. Throughput: 0: 41989.7. Samples: 373940920. Policy #0 lag: (min: 2.0, avg: 23.2, max: 42.0) [2024-03-29 15:26:23,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 15:26:26,139][00497] Updated weights for policy 0, policy_version 30022 (0.0022) [2024-03-29 15:26:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42043.1). Total num frames: 491962368. Throughput: 0: 42153.9. Samples: 374190400. Policy #0 lag: (min: 2.0, avg: 23.2, max: 42.0) [2024-03-29 15:26:28,841][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 15:26:31,175][00497] Updated weights for policy 0, policy_version 30032 (0.0022) [2024-03-29 15:26:33,840][00126] Fps is (10 sec: 40958.2, 60 sec: 42051.9, 300 sec: 42042.9). Total num frames: 492158976. Throughput: 0: 42056.9. Samples: 374320400. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:26:33,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:26:34,535][00497] Updated weights for policy 0, policy_version 30042 (0.0025) [2024-03-29 15:26:38,589][00497] Updated weights for policy 0, policy_version 30052 (0.0020) [2024-03-29 15:26:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 492371968. Throughput: 0: 41994.6. Samples: 374571800. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:26:38,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:26:42,031][00497] Updated weights for policy 0, policy_version 30062 (0.0027) [2024-03-29 15:26:43,839][00126] Fps is (10 sec: 42600.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 492584960. Throughput: 0: 41660.3. Samples: 374807200. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:26:43,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 15:26:46,703][00497] Updated weights for policy 0, policy_version 30072 (0.0025) [2024-03-29 15:26:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 492781568. Throughput: 0: 41718.8. Samples: 374944980. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:26:48,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 15:26:50,065][00497] Updated weights for policy 0, policy_version 30082 (0.0029) [2024-03-29 15:26:53,825][00476] Signal inference workers to stop experience collection... (13400 times) [2024-03-29 15:26:53,827][00476] Signal inference workers to resume experience collection... (13400 times) [2024-03-29 15:26:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 493010944. Throughput: 0: 41849.6. Samples: 375203780. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:26:53,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:26:53,872][00497] InferenceWorker_p0-w0: stopping experience collection (13400 times) [2024-03-29 15:26:53,872][00497] InferenceWorker_p0-w0: resuming experience collection (13400 times) [2024-03-29 15:26:54,132][00497] Updated weights for policy 0, policy_version 30092 (0.0027) [2024-03-29 15:26:57,215][00497] Updated weights for policy 0, policy_version 30102 (0.0033) [2024-03-29 15:26:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 493223936. Throughput: 0: 41994.3. Samples: 375446580. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:26:58,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:27:01,931][00497] Updated weights for policy 0, policy_version 30112 (0.0025) [2024-03-29 15:27:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 493420544. Throughput: 0: 41926.2. Samples: 375584340. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:27:03,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 15:27:04,153][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000030118_493453312.pth... [2024-03-29 15:27:04,464][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000029504_483393536.pth [2024-03-29 15:27:05,528][00497] Updated weights for policy 0, policy_version 30122 (0.0021) [2024-03-29 15:27:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41932.0). Total num frames: 493633536. Throughput: 0: 42039.6. Samples: 375832700. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:27:08,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 15:27:10,060][00497] Updated weights for policy 0, policy_version 30132 (0.0032) [2024-03-29 15:27:13,141][00497] Updated weights for policy 0, policy_version 30142 (0.0035) [2024-03-29 15:27:13,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 493862912. Throughput: 0: 41514.1. Samples: 376058540. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:27:13,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 15:27:18,159][00497] Updated weights for policy 0, policy_version 30152 (0.0018) [2024-03-29 15:27:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 494026752. Throughput: 0: 41608.1. Samples: 376192740. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:27:18,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 15:27:21,687][00497] Updated weights for policy 0, policy_version 30162 (0.0021) [2024-03-29 15:27:23,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 494239744. Throughput: 0: 41490.2. Samples: 376438860. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:27:23,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 15:27:25,908][00497] Updated weights for policy 0, policy_version 30172 (0.0021) [2024-03-29 15:27:26,860][00476] Signal inference workers to stop experience collection... (13450 times) [2024-03-29 15:27:26,888][00497] InferenceWorker_p0-w0: stopping experience collection (13450 times) [2024-03-29 15:27:27,043][00476] Signal inference workers to resume experience collection... (13450 times) [2024-03-29 15:27:27,044][00497] InferenceWorker_p0-w0: resuming experience collection (13450 times) [2024-03-29 15:27:28,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 494485504. Throughput: 0: 41900.1. Samples: 376692700. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 15:27:28,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:27:28,968][00497] Updated weights for policy 0, policy_version 30182 (0.0022) [2024-03-29 15:27:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.5, 300 sec: 41820.8). Total num frames: 494649344. Throughput: 0: 41538.2. Samples: 376814200. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 15:27:33,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 15:27:33,882][00497] Updated weights for policy 0, policy_version 30192 (0.0021) [2024-03-29 15:27:37,627][00497] Updated weights for policy 0, policy_version 30202 (0.0026) [2024-03-29 15:27:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 494878720. Throughput: 0: 41331.7. Samples: 377063700. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 15:27:38,840][00126] Avg episode reward: [(0, '0.484')] [2024-03-29 15:27:41,716][00497] Updated weights for policy 0, policy_version 30212 (0.0027) [2024-03-29 15:27:43,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 495091712. Throughput: 0: 41667.5. Samples: 377321620. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 15:27:43,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 15:27:44,829][00497] Updated weights for policy 0, policy_version 30222 (0.0031) [2024-03-29 15:27:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 495288320. Throughput: 0: 41080.0. Samples: 377432940. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 15:27:48,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 15:27:49,741][00497] Updated weights for policy 0, policy_version 30232 (0.0019) [2024-03-29 15:27:53,356][00497] Updated weights for policy 0, policy_version 30242 (0.0024) [2024-03-29 15:27:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 495501312. Throughput: 0: 41209.3. Samples: 377687120. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 15:27:53,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 15:27:57,109][00497] Updated weights for policy 0, policy_version 30252 (0.0019) [2024-03-29 15:27:58,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 495697920. Throughput: 0: 42041.1. Samples: 377950380. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 15:27:58,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 15:27:59,748][00476] Signal inference workers to stop experience collection... (13500 times) [2024-03-29 15:27:59,770][00497] InferenceWorker_p0-w0: stopping experience collection (13500 times) [2024-03-29 15:27:59,964][00476] Signal inference workers to resume experience collection... (13500 times) [2024-03-29 15:27:59,965][00497] InferenceWorker_p0-w0: resuming experience collection (13500 times) [2024-03-29 15:28:00,523][00497] Updated weights for policy 0, policy_version 30262 (0.0022) [2024-03-29 15:28:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 495910912. Throughput: 0: 41188.4. Samples: 378046220. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 15:28:03,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 15:28:05,393][00497] Updated weights for policy 0, policy_version 30272 (0.0021) [2024-03-29 15:28:08,828][00497] Updated weights for policy 0, policy_version 30282 (0.0026) [2024-03-29 15:28:08,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 496140288. Throughput: 0: 42088.5. Samples: 378332840. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 15:28:08,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 15:28:12,725][00497] Updated weights for policy 0, policy_version 30292 (0.0022) [2024-03-29 15:28:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.2, 300 sec: 41765.3). Total num frames: 496336896. Throughput: 0: 41876.4. Samples: 378577140. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 15:28:13,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:28:16,096][00497] Updated weights for policy 0, policy_version 30302 (0.0030) [2024-03-29 15:28:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 496549888. Throughput: 0: 41465.4. Samples: 378680140. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 15:28:18,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 15:28:20,954][00497] Updated weights for policy 0, policy_version 30312 (0.0025) [2024-03-29 15:28:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 496762880. Throughput: 0: 42279.9. Samples: 378966300. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 15:28:23,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 15:28:24,390][00497] Updated weights for policy 0, policy_version 30322 (0.0019) [2024-03-29 15:28:28,044][00497] Updated weights for policy 0, policy_version 30332 (0.0035) [2024-03-29 15:28:28,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 496975872. Throughput: 0: 41975.1. Samples: 379210500. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 15:28:28,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 15:28:31,455][00497] Updated weights for policy 0, policy_version 30342 (0.0018) [2024-03-29 15:28:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 497188864. Throughput: 0: 41942.1. Samples: 379320340. Policy #0 lag: (min: 3.0, avg: 22.4, max: 42.0) [2024-03-29 15:28:33,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:28:36,475][00497] Updated weights for policy 0, policy_version 30352 (0.0018) [2024-03-29 15:28:38,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 497385472. Throughput: 0: 42455.6. Samples: 379597620. Policy #0 lag: (min: 3.0, avg: 22.4, max: 42.0) [2024-03-29 15:28:38,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 15:28:39,676][00476] Signal inference workers to stop experience collection... (13550 times) [2024-03-29 15:28:39,710][00497] InferenceWorker_p0-w0: stopping experience collection (13550 times) [2024-03-29 15:28:39,866][00476] Signal inference workers to resume experience collection... (13550 times) [2024-03-29 15:28:39,867][00497] InferenceWorker_p0-w0: resuming experience collection (13550 times) [2024-03-29 15:28:40,156][00497] Updated weights for policy 0, policy_version 30362 (0.0022) [2024-03-29 15:28:43,791][00497] Updated weights for policy 0, policy_version 30372 (0.0025) [2024-03-29 15:28:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 497614848. Throughput: 0: 41912.7. Samples: 379836460. Policy #0 lag: (min: 3.0, avg: 22.4, max: 42.0) [2024-03-29 15:28:43,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 15:28:47,218][00497] Updated weights for policy 0, policy_version 30382 (0.0020) [2024-03-29 15:28:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 497827840. Throughput: 0: 42556.4. Samples: 379961260. Policy #0 lag: (min: 3.0, avg: 22.4, max: 42.0) [2024-03-29 15:28:48,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 15:28:51,843][00497] Updated weights for policy 0, policy_version 30392 (0.0020) [2024-03-29 15:28:53,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 498008064. Throughput: 0: 41939.2. Samples: 380220100. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 15:28:53,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 15:28:55,539][00497] Updated weights for policy 0, policy_version 30402 (0.0026) [2024-03-29 15:28:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 498221056. Throughput: 0: 42086.7. Samples: 380471040. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 15:28:58,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 15:28:59,400][00497] Updated weights for policy 0, policy_version 30412 (0.0017) [2024-03-29 15:29:03,013][00497] Updated weights for policy 0, policy_version 30422 (0.0024) [2024-03-29 15:29:03,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 498450432. Throughput: 0: 42687.1. Samples: 380601060. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 15:29:03,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 15:29:03,915][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000030424_498466816.pth... [2024-03-29 15:29:04,238][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000029810_488407040.pth [2024-03-29 15:29:07,822][00497] Updated weights for policy 0, policy_version 30432 (0.0018) [2024-03-29 15:29:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 498614272. Throughput: 0: 41561.7. Samples: 380836580. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 15:29:08,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:29:11,638][00497] Updated weights for policy 0, policy_version 30442 (0.0029) [2024-03-29 15:29:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 498843648. Throughput: 0: 41349.0. Samples: 381071200. Policy #0 lag: (min: 0.0, avg: 20.9, max: 45.0) [2024-03-29 15:29:13,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:29:15,364][00497] Updated weights for policy 0, policy_version 30452 (0.0025) [2024-03-29 15:29:17,076][00476] Signal inference workers to stop experience collection... (13600 times) [2024-03-29 15:29:17,161][00497] InferenceWorker_p0-w0: stopping experience collection (13600 times) [2024-03-29 15:29:17,166][00476] Signal inference workers to resume experience collection... (13600 times) [2024-03-29 15:29:17,187][00497] InferenceWorker_p0-w0: resuming experience collection (13600 times) [2024-03-29 15:29:18,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 499073024. Throughput: 0: 41953.4. Samples: 381208240. Policy #0 lag: (min: 0.0, avg: 20.9, max: 45.0) [2024-03-29 15:29:18,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:29:19,031][00497] Updated weights for policy 0, policy_version 30462 (0.0031) [2024-03-29 15:29:23,696][00497] Updated weights for policy 0, policy_version 30472 (0.0018) [2024-03-29 15:29:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 499253248. Throughput: 0: 41112.0. Samples: 381447660. Policy #0 lag: (min: 0.0, avg: 20.9, max: 45.0) [2024-03-29 15:29:23,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 15:29:27,742][00497] Updated weights for policy 0, policy_version 30482 (0.0044) [2024-03-29 15:29:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 499466240. Throughput: 0: 41369.9. Samples: 381698100. Policy #0 lag: (min: 0.0, avg: 20.9, max: 45.0) [2024-03-29 15:29:28,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:29:31,513][00497] Updated weights for policy 0, policy_version 30492 (0.0025) [2024-03-29 15:29:33,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.3, 300 sec: 41820.9). Total num frames: 499679232. Throughput: 0: 41335.2. Samples: 381821340. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:29:33,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 15:29:34,969][00497] Updated weights for policy 0, policy_version 30502 (0.0030) [2024-03-29 15:29:38,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 499875840. Throughput: 0: 41108.3. Samples: 382069980. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:29:38,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 15:29:39,415][00497] Updated weights for policy 0, policy_version 30512 (0.0022) [2024-03-29 15:29:43,588][00497] Updated weights for policy 0, policy_version 30522 (0.0034) [2024-03-29 15:29:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 500088832. Throughput: 0: 41128.9. Samples: 382321840. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:29:43,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 15:29:47,194][00497] Updated weights for policy 0, policy_version 30532 (0.0022) [2024-03-29 15:29:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 500285440. Throughput: 0: 40879.6. Samples: 382440640. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:29:48,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 15:29:49,807][00476] Signal inference workers to stop experience collection... (13650 times) [2024-03-29 15:29:49,841][00497] InferenceWorker_p0-w0: stopping experience collection (13650 times) [2024-03-29 15:29:50,022][00476] Signal inference workers to resume experience collection... (13650 times) [2024-03-29 15:29:50,022][00497] InferenceWorker_p0-w0: resuming experience collection (13650 times) [2024-03-29 15:29:50,946][00497] Updated weights for policy 0, policy_version 30542 (0.0029) [2024-03-29 15:29:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 500498432. Throughput: 0: 41018.3. Samples: 382682400. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:29:53,842][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 15:29:55,344][00497] Updated weights for policy 0, policy_version 30552 (0.0020) [2024-03-29 15:29:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 500695040. Throughput: 0: 41410.2. Samples: 382934660. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:29:58,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:29:59,209][00497] Updated weights for policy 0, policy_version 30562 (0.0031) [2024-03-29 15:30:03,167][00497] Updated weights for policy 0, policy_version 30572 (0.0028) [2024-03-29 15:30:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 500908032. Throughput: 0: 41112.0. Samples: 383058280. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:30:03,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 15:30:06,857][00497] Updated weights for policy 0, policy_version 30582 (0.0024) [2024-03-29 15:30:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 501121024. Throughput: 0: 41119.9. Samples: 383298060. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 15:30:08,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 15:30:11,362][00497] Updated weights for policy 0, policy_version 30592 (0.0019) [2024-03-29 15:30:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 501317632. Throughput: 0: 41405.2. Samples: 383561340. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 15:30:13,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:30:15,315][00497] Updated weights for policy 0, policy_version 30602 (0.0033) [2024-03-29 15:30:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40959.9, 300 sec: 41654.2). Total num frames: 501530624. Throughput: 0: 41182.0. Samples: 383674540. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 15:30:18,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 15:30:19,029][00497] Updated weights for policy 0, policy_version 30612 (0.0020) [2024-03-29 15:30:20,630][00476] Signal inference workers to stop experience collection... (13700 times) [2024-03-29 15:30:20,635][00476] Signal inference workers to resume experience collection... (13700 times) [2024-03-29 15:30:20,679][00497] InferenceWorker_p0-w0: stopping experience collection (13700 times) [2024-03-29 15:30:20,679][00497] InferenceWorker_p0-w0: resuming experience collection (13700 times) [2024-03-29 15:30:22,358][00497] Updated weights for policy 0, policy_version 30622 (0.0024) [2024-03-29 15:30:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 501760000. Throughput: 0: 41260.1. Samples: 383926680. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 15:30:23,841][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 15:30:26,980][00497] Updated weights for policy 0, policy_version 30632 (0.0018) [2024-03-29 15:30:28,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 501940224. Throughput: 0: 41645.8. Samples: 384195900. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 15:30:28,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 15:30:30,864][00497] Updated weights for policy 0, policy_version 30642 (0.0021) [2024-03-29 15:30:33,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 502169600. Throughput: 0: 41492.7. Samples: 384307820. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:30:33,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 15:30:34,832][00497] Updated weights for policy 0, policy_version 30652 (0.0024) [2024-03-29 15:30:38,115][00497] Updated weights for policy 0, policy_version 30662 (0.0034) [2024-03-29 15:30:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 502382592. Throughput: 0: 41692.5. Samples: 384558560. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:30:38,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 15:30:42,803][00497] Updated weights for policy 0, policy_version 30672 (0.0022) [2024-03-29 15:30:43,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 502562816. Throughput: 0: 41884.8. Samples: 384819480. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:30:43,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 15:30:46,391][00497] Updated weights for policy 0, policy_version 30682 (0.0033) [2024-03-29 15:30:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 502792192. Throughput: 0: 41864.9. Samples: 384942200. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 15:30:48,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 15:30:50,404][00497] Updated weights for policy 0, policy_version 30692 (0.0031) [2024-03-29 15:30:52,327][00476] Signal inference workers to stop experience collection... (13750 times) [2024-03-29 15:30:52,403][00497] InferenceWorker_p0-w0: stopping experience collection (13750 times) [2024-03-29 15:30:52,490][00476] Signal inference workers to resume experience collection... (13750 times) [2024-03-29 15:30:52,491][00497] InferenceWorker_p0-w0: resuming experience collection (13750 times) [2024-03-29 15:30:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 503005184. Throughput: 0: 42161.3. Samples: 385195320. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 15:30:53,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 15:30:53,917][00497] Updated weights for policy 0, policy_version 30702 (0.0029) [2024-03-29 15:30:58,428][00497] Updated weights for policy 0, policy_version 30712 (0.0025) [2024-03-29 15:30:58,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 503201792. Throughput: 0: 41901.4. Samples: 385446900. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 15:30:58,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 15:31:02,008][00497] Updated weights for policy 0, policy_version 30722 (0.0028) [2024-03-29 15:31:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 503414784. Throughput: 0: 41998.7. Samples: 385564480. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 15:31:03,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 15:31:03,897][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000030727_503431168.pth... [2024-03-29 15:31:04,206][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000030118_493453312.pth [2024-03-29 15:31:06,052][00497] Updated weights for policy 0, policy_version 30732 (0.0024) [2024-03-29 15:31:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 503644160. Throughput: 0: 42228.0. Samples: 385826940. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 15:31:08,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 15:31:09,428][00497] Updated weights for policy 0, policy_version 30742 (0.0026) [2024-03-29 15:31:13,840][00126] Fps is (10 sec: 40958.2, 60 sec: 41778.9, 300 sec: 41654.2). Total num frames: 503824384. Throughput: 0: 41891.4. Samples: 386081040. Policy #0 lag: (min: 0.0, avg: 21.5, max: 40.0) [2024-03-29 15:31:13,841][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 15:31:13,981][00497] Updated weights for policy 0, policy_version 30752 (0.0029) [2024-03-29 15:31:17,414][00497] Updated weights for policy 0, policy_version 30762 (0.0024) [2024-03-29 15:31:18,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 504053760. Throughput: 0: 41865.8. Samples: 386191780. Policy #0 lag: (min: 0.0, avg: 21.5, max: 40.0) [2024-03-29 15:31:18,841][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 15:31:21,594][00497] Updated weights for policy 0, policy_version 30772 (0.0027) [2024-03-29 15:31:22,972][00476] Signal inference workers to stop experience collection... (13800 times) [2024-03-29 15:31:23,052][00497] InferenceWorker_p0-w0: stopping experience collection (13800 times) [2024-03-29 15:31:23,052][00476] Signal inference workers to resume experience collection... (13800 times) [2024-03-29 15:31:23,087][00497] InferenceWorker_p0-w0: resuming experience collection (13800 times) [2024-03-29 15:31:23,839][00126] Fps is (10 sec: 42600.7, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 504250368. Throughput: 0: 42084.4. Samples: 386452360. Policy #0 lag: (min: 0.0, avg: 21.5, max: 40.0) [2024-03-29 15:31:23,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 15:31:25,065][00497] Updated weights for policy 0, policy_version 30782 (0.0022) [2024-03-29 15:31:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.1, 300 sec: 41654.3). Total num frames: 504446976. Throughput: 0: 41793.7. Samples: 386700200. Policy #0 lag: (min: 0.0, avg: 21.5, max: 40.0) [2024-03-29 15:31:28,841][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 15:31:29,627][00497] Updated weights for policy 0, policy_version 30792 (0.0017) [2024-03-29 15:31:32,980][00497] Updated weights for policy 0, policy_version 30802 (0.0025) [2024-03-29 15:31:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 504692736. Throughput: 0: 42121.2. Samples: 386837660. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:31:33,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:31:37,109][00497] Updated weights for policy 0, policy_version 30812 (0.0025) [2024-03-29 15:31:38,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 504889344. Throughput: 0: 41920.1. Samples: 387081720. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:31:38,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 15:31:40,686][00497] Updated weights for policy 0, policy_version 30822 (0.0027) [2024-03-29 15:31:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 505085952. Throughput: 0: 41746.6. Samples: 387325500. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:31:43,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:31:45,434][00497] Updated weights for policy 0, policy_version 30832 (0.0024) [2024-03-29 15:31:48,821][00497] Updated weights for policy 0, policy_version 30842 (0.0027) [2024-03-29 15:31:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 505315328. Throughput: 0: 42132.1. Samples: 387460420. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 15:31:48,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 15:31:52,704][00497] Updated weights for policy 0, policy_version 30852 (0.0029) [2024-03-29 15:31:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 505511936. Throughput: 0: 41631.5. Samples: 387700360. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 15:31:53,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 15:31:55,621][00476] Signal inference workers to stop experience collection... (13850 times) [2024-03-29 15:31:55,646][00497] InferenceWorker_p0-w0: stopping experience collection (13850 times) [2024-03-29 15:31:55,802][00476] Signal inference workers to resume experience collection... (13850 times) [2024-03-29 15:31:55,803][00497] InferenceWorker_p0-w0: resuming experience collection (13850 times) [2024-03-29 15:31:56,327][00497] Updated weights for policy 0, policy_version 30862 (0.0021) [2024-03-29 15:31:58,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 505724928. Throughput: 0: 41472.0. Samples: 387947260. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 15:31:58,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 15:32:00,963][00497] Updated weights for policy 0, policy_version 30872 (0.0022) [2024-03-29 15:32:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 505937920. Throughput: 0: 42139.2. Samples: 388088040. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 15:32:03,841][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 15:32:04,280][00497] Updated weights for policy 0, policy_version 30882 (0.0023) [2024-03-29 15:32:08,527][00497] Updated weights for policy 0, policy_version 30892 (0.0017) [2024-03-29 15:32:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 506134528. Throughput: 0: 41616.0. Samples: 388325080. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 15:32:08,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 15:32:11,953][00497] Updated weights for policy 0, policy_version 30902 (0.0026) [2024-03-29 15:32:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.7, 300 sec: 41820.8). Total num frames: 506363904. Throughput: 0: 41584.9. Samples: 388571520. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 15:32:13,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:32:16,613][00497] Updated weights for policy 0, policy_version 30912 (0.0027) [2024-03-29 15:32:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 506544128. Throughput: 0: 41735.2. Samples: 388715740. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 15:32:18,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 15:32:20,096][00497] Updated weights for policy 0, policy_version 30922 (0.0019) [2024-03-29 15:32:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 506773504. Throughput: 0: 41531.0. Samples: 388950620. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 15:32:23,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 15:32:24,172][00497] Updated weights for policy 0, policy_version 30932 (0.0017) [2024-03-29 15:32:27,877][00497] Updated weights for policy 0, policy_version 30942 (0.0023) [2024-03-29 15:32:28,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 506986496. Throughput: 0: 41354.2. Samples: 389186440. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 15:32:28,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:32:31,837][00476] Signal inference workers to stop experience collection... (13900 times) [2024-03-29 15:32:31,875][00497] InferenceWorker_p0-w0: stopping experience collection (13900 times) [2024-03-29 15:32:32,064][00476] Signal inference workers to resume experience collection... (13900 times) [2024-03-29 15:32:32,064][00497] InferenceWorker_p0-w0: resuming experience collection (13900 times) [2024-03-29 15:32:32,367][00497] Updated weights for policy 0, policy_version 30952 (0.0020) [2024-03-29 15:32:33,839][00126] Fps is (10 sec: 37683.7, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 507150336. Throughput: 0: 41633.8. Samples: 389333940. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 15:32:33,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 15:32:35,944][00497] Updated weights for policy 0, policy_version 30962 (0.0028) [2024-03-29 15:32:38,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 507379712. Throughput: 0: 41413.9. Samples: 389563980. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:32:38,841][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 15:32:40,102][00497] Updated weights for policy 0, policy_version 30972 (0.0027) [2024-03-29 15:32:43,784][00497] Updated weights for policy 0, policy_version 30982 (0.0028) [2024-03-29 15:32:43,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 507609088. Throughput: 0: 41539.5. Samples: 389816540. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:32:43,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 15:32:48,168][00497] Updated weights for policy 0, policy_version 30992 (0.0025) [2024-03-29 15:32:48,839][00126] Fps is (10 sec: 39320.8, 60 sec: 40959.9, 300 sec: 41598.7). Total num frames: 507772928. Throughput: 0: 41267.0. Samples: 389945060. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:32:48,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 15:32:51,565][00497] Updated weights for policy 0, policy_version 31002 (0.0025) [2024-03-29 15:32:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41709.7). Total num frames: 508002304. Throughput: 0: 41322.6. Samples: 390184600. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:32:53,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 15:32:55,812][00497] Updated weights for policy 0, policy_version 31012 (0.0024) [2024-03-29 15:32:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 508215296. Throughput: 0: 41884.6. Samples: 390456320. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 15:32:58,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 15:32:59,437][00497] Updated weights for policy 0, policy_version 31022 (0.0027) [2024-03-29 15:33:03,720][00497] Updated weights for policy 0, policy_version 31032 (0.0021) [2024-03-29 15:33:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 508428288. Throughput: 0: 41171.9. Samples: 390568480. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 15:33:03,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 15:33:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031032_508428288.pth... [2024-03-29 15:33:04,191][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000030424_498466816.pth [2024-03-29 15:33:05,558][00476] Signal inference workers to stop experience collection... (13950 times) [2024-03-29 15:33:05,597][00497] InferenceWorker_p0-w0: stopping experience collection (13950 times) [2024-03-29 15:33:05,784][00476] Signal inference workers to resume experience collection... (13950 times) [2024-03-29 15:33:05,784][00497] InferenceWorker_p0-w0: resuming experience collection (13950 times) [2024-03-29 15:33:07,155][00497] Updated weights for policy 0, policy_version 31042 (0.0028) [2024-03-29 15:33:08,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 508624896. Throughput: 0: 41519.6. Samples: 390819000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 15:33:08,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 15:33:11,508][00497] Updated weights for policy 0, policy_version 31052 (0.0023) [2024-03-29 15:33:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 508837888. Throughput: 0: 42080.5. Samples: 391080060. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 15:33:13,842][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 15:33:14,996][00497] Updated weights for policy 0, policy_version 31062 (0.0022) [2024-03-29 15:33:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 509050880. Throughput: 0: 41336.9. Samples: 391194100. Policy #0 lag: (min: 1.0, avg: 21.4, max: 43.0) [2024-03-29 15:33:18,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:33:19,223][00497] Updated weights for policy 0, policy_version 31072 (0.0019) [2024-03-29 15:33:22,705][00497] Updated weights for policy 0, policy_version 31082 (0.0029) [2024-03-29 15:33:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 509280256. Throughput: 0: 41859.4. Samples: 391447660. Policy #0 lag: (min: 1.0, avg: 21.4, max: 43.0) [2024-03-29 15:33:23,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 15:33:26,986][00497] Updated weights for policy 0, policy_version 31092 (0.0022) [2024-03-29 15:33:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 509476864. Throughput: 0: 42182.8. Samples: 391714760. Policy #0 lag: (min: 1.0, avg: 21.4, max: 43.0) [2024-03-29 15:33:28,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 15:33:30,625][00497] Updated weights for policy 0, policy_version 31102 (0.0025) [2024-03-29 15:33:33,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 509673472. Throughput: 0: 41829.4. Samples: 391827380. Policy #0 lag: (min: 1.0, avg: 21.4, max: 43.0) [2024-03-29 15:33:33,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 15:33:34,995][00497] Updated weights for policy 0, policy_version 31112 (0.0022) [2024-03-29 15:33:37,648][00476] Signal inference workers to stop experience collection... (14000 times) [2024-03-29 15:33:37,679][00497] InferenceWorker_p0-w0: stopping experience collection (14000 times) [2024-03-29 15:33:37,859][00476] Signal inference workers to resume experience collection... (14000 times) [2024-03-29 15:33:37,859][00497] InferenceWorker_p0-w0: resuming experience collection (14000 times) [2024-03-29 15:33:38,384][00497] Updated weights for policy 0, policy_version 31122 (0.0031) [2024-03-29 15:33:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 509919232. Throughput: 0: 42367.2. Samples: 392091120. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 15:33:38,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 15:33:42,773][00497] Updated weights for policy 0, policy_version 31132 (0.0037) [2024-03-29 15:33:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 510099456. Throughput: 0: 41789.3. Samples: 392336840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 15:33:43,840][00126] Avg episode reward: [(0, '0.476')] [2024-03-29 15:33:46,399][00497] Updated weights for policy 0, policy_version 31142 (0.0023) [2024-03-29 15:33:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 510312448. Throughput: 0: 41877.8. Samples: 392452980. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 15:33:48,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 15:33:50,800][00497] Updated weights for policy 0, policy_version 31152 (0.0021) [2024-03-29 15:33:53,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 510525440. Throughput: 0: 42359.9. Samples: 392725200. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 15:33:53,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 15:33:54,306][00497] Updated weights for policy 0, policy_version 31162 (0.0035) [2024-03-29 15:33:58,292][00497] Updated weights for policy 0, policy_version 31172 (0.0022) [2024-03-29 15:33:58,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 510738432. Throughput: 0: 41945.8. Samples: 392967620. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:33:58,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 15:34:02,049][00497] Updated weights for policy 0, policy_version 31182 (0.0030) [2024-03-29 15:34:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 510967808. Throughput: 0: 42294.2. Samples: 393097340. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:34:03,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 15:34:06,215][00497] Updated weights for policy 0, policy_version 31192 (0.0021) [2024-03-29 15:34:08,814][00476] Signal inference workers to stop experience collection... (14050 times) [2024-03-29 15:34:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 511148032. Throughput: 0: 42345.8. Samples: 393353220. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:34:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:34:08,850][00497] InferenceWorker_p0-w0: stopping experience collection (14050 times) [2024-03-29 15:34:09,044][00476] Signal inference workers to resume experience collection... (14050 times) [2024-03-29 15:34:09,045][00497] InferenceWorker_p0-w0: resuming experience collection (14050 times) [2024-03-29 15:34:09,799][00497] Updated weights for policy 0, policy_version 31202 (0.0029) [2024-03-29 15:34:13,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 511344640. Throughput: 0: 41642.7. Samples: 393588680. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:34:13,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 15:34:14,195][00497] Updated weights for policy 0, policy_version 31212 (0.0030) [2024-03-29 15:34:17,968][00497] Updated weights for policy 0, policy_version 31222 (0.0024) [2024-03-29 15:34:18,839][00126] Fps is (10 sec: 44237.7, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 511590400. Throughput: 0: 41935.2. Samples: 393714460. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:34:18,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 15:34:22,105][00497] Updated weights for policy 0, policy_version 31232 (0.0026) [2024-03-29 15:34:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 511737856. Throughput: 0: 41542.7. Samples: 393960540. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:34:23,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 15:34:25,758][00497] Updated weights for policy 0, policy_version 31242 (0.0027) [2024-03-29 15:34:28,839][00126] Fps is (10 sec: 36044.0, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 511950848. Throughput: 0: 41298.6. Samples: 394195280. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:34:28,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 15:34:30,216][00497] Updated weights for policy 0, policy_version 31252 (0.0022) [2024-03-29 15:34:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 512180224. Throughput: 0: 41704.8. Samples: 394329700. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:34:33,841][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 15:34:33,997][00497] Updated weights for policy 0, policy_version 31262 (0.0037) [2024-03-29 15:34:36,958][00476] Signal inference workers to stop experience collection... (14100 times) [2024-03-29 15:34:36,999][00497] InferenceWorker_p0-w0: stopping experience collection (14100 times) [2024-03-29 15:34:37,047][00476] Signal inference workers to resume experience collection... (14100 times) [2024-03-29 15:34:37,047][00497] InferenceWorker_p0-w0: resuming experience collection (14100 times) [2024-03-29 15:34:38,091][00497] Updated weights for policy 0, policy_version 31272 (0.0022) [2024-03-29 15:34:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40686.8, 300 sec: 41598.7). Total num frames: 512360448. Throughput: 0: 41190.7. Samples: 394578780. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:34:38,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 15:34:41,750][00497] Updated weights for policy 0, policy_version 31282 (0.0026) [2024-03-29 15:34:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 512606208. Throughput: 0: 41172.8. Samples: 394820400. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:34:43,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:34:45,806][00497] Updated weights for policy 0, policy_version 31292 (0.0019) [2024-03-29 15:34:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 512802816. Throughput: 0: 41158.7. Samples: 394949480. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:34:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 15:34:49,639][00497] Updated weights for policy 0, policy_version 31302 (0.0035) [2024-03-29 15:34:53,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.2, 300 sec: 41709.8). Total num frames: 512999424. Throughput: 0: 40951.6. Samples: 395196040. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:34:53,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:34:53,914][00497] Updated weights for policy 0, policy_version 31312 (0.0034) [2024-03-29 15:34:57,512][00497] Updated weights for policy 0, policy_version 31322 (0.0028) [2024-03-29 15:34:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 513228800. Throughput: 0: 41038.2. Samples: 395435400. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 15:34:58,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 15:35:01,618][00497] Updated weights for policy 0, policy_version 31332 (0.0027) [2024-03-29 15:35:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 41709.8). Total num frames: 513425408. Throughput: 0: 41223.0. Samples: 395569500. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 15:35:03,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 15:35:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031337_513425408.pth... [2024-03-29 15:35:04,176][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000030727_503431168.pth [2024-03-29 15:35:05,542][00497] Updated weights for policy 0, policy_version 31342 (0.0023) [2024-03-29 15:35:08,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 513622016. Throughput: 0: 41072.8. Samples: 395808820. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 15:35:08,841][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 15:35:09,712][00497] Updated weights for policy 0, policy_version 31352 (0.0026) [2024-03-29 15:35:12,210][00476] Signal inference workers to stop experience collection... (14150 times) [2024-03-29 15:35:12,253][00497] InferenceWorker_p0-w0: stopping experience collection (14150 times) [2024-03-29 15:35:12,291][00476] Signal inference workers to resume experience collection... (14150 times) [2024-03-29 15:35:12,295][00497] InferenceWorker_p0-w0: resuming experience collection (14150 times) [2024-03-29 15:35:13,357][00497] Updated weights for policy 0, policy_version 31362 (0.0021) [2024-03-29 15:35:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 513851392. Throughput: 0: 41417.0. Samples: 396059040. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 15:35:13,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 15:35:17,682][00497] Updated weights for policy 0, policy_version 31372 (0.0022) [2024-03-29 15:35:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40686.8, 300 sec: 41598.7). Total num frames: 514031616. Throughput: 0: 41286.7. Samples: 396187600. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:35:18,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:35:21,611][00497] Updated weights for policy 0, policy_version 31382 (0.0022) [2024-03-29 15:35:23,840][00126] Fps is (10 sec: 39320.6, 60 sec: 41779.0, 300 sec: 41709.7). Total num frames: 514244608. Throughput: 0: 41003.0. Samples: 396423920. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:35:23,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 15:35:25,920][00497] Updated weights for policy 0, policy_version 31392 (0.0025) [2024-03-29 15:35:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 514457600. Throughput: 0: 41462.7. Samples: 396686220. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:35:28,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:35:29,461][00497] Updated weights for policy 0, policy_version 31402 (0.0031) [2024-03-29 15:35:33,693][00497] Updated weights for policy 0, policy_version 31412 (0.0018) [2024-03-29 15:35:33,839][00126] Fps is (10 sec: 40960.9, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 514654208. Throughput: 0: 40879.5. Samples: 396789060. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:35:33,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 15:35:37,448][00497] Updated weights for policy 0, policy_version 31422 (0.0025) [2024-03-29 15:35:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 514867200. Throughput: 0: 41360.9. Samples: 397057280. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 15:35:38,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 15:35:42,003][00497] Updated weights for policy 0, policy_version 31432 (0.0024) [2024-03-29 15:35:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40686.9, 300 sec: 41543.1). Total num frames: 515047424. Throughput: 0: 41745.7. Samples: 397313960. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:35:43,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 15:35:44,696][00476] Signal inference workers to stop experience collection... (14200 times) [2024-03-29 15:35:44,732][00497] InferenceWorker_p0-w0: stopping experience collection (14200 times) [2024-03-29 15:35:44,917][00476] Signal inference workers to resume experience collection... (14200 times) [2024-03-29 15:35:44,918][00497] InferenceWorker_p0-w0: resuming experience collection (14200 times) [2024-03-29 15:35:45,217][00497] Updated weights for policy 0, policy_version 31442 (0.0023) [2024-03-29 15:35:48,839][00126] Fps is (10 sec: 39322.1, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 515260416. Throughput: 0: 40982.3. Samples: 397413700. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:35:48,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 15:35:49,653][00497] Updated weights for policy 0, policy_version 31452 (0.0029) [2024-03-29 15:35:53,452][00497] Updated weights for policy 0, policy_version 31462 (0.0026) [2024-03-29 15:35:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 515489792. Throughput: 0: 41358.3. Samples: 397669940. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:35:53,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:35:57,821][00497] Updated weights for policy 0, policy_version 31472 (0.0022) [2024-03-29 15:35:58,839][00126] Fps is (10 sec: 39320.8, 60 sec: 40413.8, 300 sec: 41487.6). Total num frames: 515653632. Throughput: 0: 41252.8. Samples: 397915420. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:35:58,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 15:36:01,321][00497] Updated weights for policy 0, policy_version 31482 (0.0022) [2024-03-29 15:36:03,839][00126] Fps is (10 sec: 39321.0, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 515883008. Throughput: 0: 40982.6. Samples: 398031820. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 15:36:03,842][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:36:05,533][00497] Updated weights for policy 0, policy_version 31492 (0.0020) [2024-03-29 15:36:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 516079616. Throughput: 0: 41497.6. Samples: 398291300. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 15:36:08,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:36:09,437][00497] Updated weights for policy 0, policy_version 31502 (0.0030) [2024-03-29 15:36:13,608][00497] Updated weights for policy 0, policy_version 31512 (0.0030) [2024-03-29 15:36:13,840][00126] Fps is (10 sec: 40959.7, 60 sec: 40686.8, 300 sec: 41487.6). Total num frames: 516292608. Throughput: 0: 41171.8. Samples: 398538960. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 15:36:13,841][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 15:36:17,025][00497] Updated weights for policy 0, policy_version 31522 (0.0030) [2024-03-29 15:36:18,839][00126] Fps is (10 sec: 45874.9, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 516538368. Throughput: 0: 41755.1. Samples: 398668040. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 15:36:18,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 15:36:20,240][00476] Signal inference workers to stop experience collection... (14250 times) [2024-03-29 15:36:20,300][00497] InferenceWorker_p0-w0: stopping experience collection (14250 times) [2024-03-29 15:36:20,334][00476] Signal inference workers to resume experience collection... (14250 times) [2024-03-29 15:36:20,337][00497] InferenceWorker_p0-w0: resuming experience collection (14250 times) [2024-03-29 15:36:21,127][00497] Updated weights for policy 0, policy_version 31532 (0.0031) [2024-03-29 15:36:23,839][00126] Fps is (10 sec: 42599.7, 60 sec: 41233.3, 300 sec: 41598.7). Total num frames: 516718592. Throughput: 0: 41518.7. Samples: 398925620. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 15:36:23,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 15:36:24,924][00497] Updated weights for policy 0, policy_version 31542 (0.0021) [2024-03-29 15:36:28,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 516931584. Throughput: 0: 41329.0. Samples: 399173760. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 15:36:28,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 15:36:29,127][00497] Updated weights for policy 0, policy_version 31552 (0.0023) [2024-03-29 15:36:32,551][00497] Updated weights for policy 0, policy_version 31562 (0.0024) [2024-03-29 15:36:33,839][00126] Fps is (10 sec: 45874.3, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 517177344. Throughput: 0: 42049.1. Samples: 399305920. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 15:36:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 15:36:36,706][00497] Updated weights for policy 0, policy_version 31572 (0.0028) [2024-03-29 15:36:38,841][00126] Fps is (10 sec: 42591.9, 60 sec: 41505.1, 300 sec: 41598.5). Total num frames: 517357568. Throughput: 0: 41929.7. Samples: 399556840. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 15:36:38,843][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 15:36:40,378][00497] Updated weights for policy 0, policy_version 31582 (0.0023) [2024-03-29 15:36:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41543.1). Total num frames: 517570560. Throughput: 0: 42102.6. Samples: 399810040. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 15:36:43,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:36:44,500][00497] Updated weights for policy 0, policy_version 31592 (0.0018) [2024-03-29 15:36:47,785][00497] Updated weights for policy 0, policy_version 31602 (0.0029) [2024-03-29 15:36:48,839][00126] Fps is (10 sec: 45882.3, 60 sec: 42598.3, 300 sec: 41709.8). Total num frames: 517816320. Throughput: 0: 42534.0. Samples: 399945840. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 15:36:48,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:36:52,031][00497] Updated weights for policy 0, policy_version 31612 (0.0025) [2024-03-29 15:36:53,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 517996544. Throughput: 0: 42195.1. Samples: 400190080. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 15:36:53,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:36:55,832][00476] Signal inference workers to stop experience collection... (14300 times) [2024-03-29 15:36:55,833][00476] Signal inference workers to resume experience collection... (14300 times) [2024-03-29 15:36:55,833][00497] Updated weights for policy 0, policy_version 31622 (0.0024) [2024-03-29 15:36:55,881][00497] InferenceWorker_p0-w0: stopping experience collection (14300 times) [2024-03-29 15:36:55,881][00497] InferenceWorker_p0-w0: resuming experience collection (14300 times) [2024-03-29 15:36:58,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42325.4, 300 sec: 41543.2). Total num frames: 518193152. Throughput: 0: 42065.5. Samples: 400431900. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 15:36:58,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 15:37:00,293][00497] Updated weights for policy 0, policy_version 31632 (0.0035) [2024-03-29 15:37:03,693][00497] Updated weights for policy 0, policy_version 31642 (0.0027) [2024-03-29 15:37:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.5, 300 sec: 41654.2). Total num frames: 518422528. Throughput: 0: 42256.1. Samples: 400569560. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 15:37:03,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 15:37:04,187][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031644_518455296.pth... [2024-03-29 15:37:04,491][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031032_508428288.pth [2024-03-29 15:37:08,025][00497] Updated weights for policy 0, policy_version 31652 (0.0017) [2024-03-29 15:37:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 41487.6). Total num frames: 518602752. Throughput: 0: 41881.2. Samples: 400810280. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 15:37:08,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 15:37:11,867][00497] Updated weights for policy 0, policy_version 31662 (0.0023) [2024-03-29 15:37:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.5, 300 sec: 41654.2). Total num frames: 518832128. Throughput: 0: 41754.6. Samples: 401052720. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 15:37:13,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 15:37:15,930][00497] Updated weights for policy 0, policy_version 31672 (0.0027) [2024-03-29 15:37:18,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 519045120. Throughput: 0: 41957.6. Samples: 401194000. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 15:37:18,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 15:37:19,182][00497] Updated weights for policy 0, policy_version 31682 (0.0019) [2024-03-29 15:37:23,485][00497] Updated weights for policy 0, policy_version 31692 (0.0035) [2024-03-29 15:37:23,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41543.2). Total num frames: 519241728. Throughput: 0: 42030.8. Samples: 401448160. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:37:23,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 15:37:27,459][00497] Updated weights for policy 0, policy_version 31702 (0.0018) [2024-03-29 15:37:27,946][00476] Signal inference workers to stop experience collection... (14350 times) [2024-03-29 15:37:28,018][00497] InferenceWorker_p0-w0: stopping experience collection (14350 times) [2024-03-29 15:37:28,023][00476] Signal inference workers to resume experience collection... (14350 times) [2024-03-29 15:37:28,043][00497] InferenceWorker_p0-w0: resuming experience collection (14350 times) [2024-03-29 15:37:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 519471104. Throughput: 0: 41687.3. Samples: 401685960. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:37:28,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:37:31,534][00497] Updated weights for policy 0, policy_version 31712 (0.0019) [2024-03-29 15:37:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 519667712. Throughput: 0: 41664.8. Samples: 401820760. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:37:33,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 15:37:35,051][00497] Updated weights for policy 0, policy_version 31722 (0.0025) [2024-03-29 15:37:38,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42053.4, 300 sec: 41598.7). Total num frames: 519880704. Throughput: 0: 41548.1. Samples: 402059740. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:37:38,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 15:37:39,112][00497] Updated weights for policy 0, policy_version 31732 (0.0024) [2024-03-29 15:37:43,396][00497] Updated weights for policy 0, policy_version 31742 (0.0021) [2024-03-29 15:37:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 520077312. Throughput: 0: 41726.3. Samples: 402309580. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 15:37:43,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 15:37:47,480][00497] Updated weights for policy 0, policy_version 31752 (0.0023) [2024-03-29 15:37:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 520290304. Throughput: 0: 41428.1. Samples: 402433820. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 15:37:48,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 15:37:50,993][00497] Updated weights for policy 0, policy_version 31762 (0.0027) [2024-03-29 15:37:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 520486912. Throughput: 0: 41431.1. Samples: 402674680. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 15:37:53,841][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 15:37:54,859][00497] Updated weights for policy 0, policy_version 31772 (0.0023) [2024-03-29 15:37:58,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 520699904. Throughput: 0: 42294.2. Samples: 402955960. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 15:37:58,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 15:37:58,931][00497] Updated weights for policy 0, policy_version 31782 (0.0034) [2024-03-29 15:38:00,012][00476] Signal inference workers to stop experience collection... (14400 times) [2024-03-29 15:38:00,066][00497] InferenceWorker_p0-w0: stopping experience collection (14400 times) [2024-03-29 15:38:00,096][00476] Signal inference workers to resume experience collection... (14400 times) [2024-03-29 15:38:00,100][00497] InferenceWorker_p0-w0: resuming experience collection (14400 times) [2024-03-29 15:38:02,794][00497] Updated weights for policy 0, policy_version 31792 (0.0021) [2024-03-29 15:38:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 520912896. Throughput: 0: 41619.9. Samples: 403066900. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 15:38:03,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 15:38:06,383][00497] Updated weights for policy 0, policy_version 31802 (0.0026) [2024-03-29 15:38:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 521142272. Throughput: 0: 41544.9. Samples: 403317680. Policy #0 lag: (min: 0.0, avg: 20.6, max: 44.0) [2024-03-29 15:38:08,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 15:38:10,290][00497] Updated weights for policy 0, policy_version 31812 (0.0024) [2024-03-29 15:38:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 521338880. Throughput: 0: 42075.5. Samples: 403579360. Policy #0 lag: (min: 0.0, avg: 20.6, max: 44.0) [2024-03-29 15:38:13,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 15:38:14,353][00497] Updated weights for policy 0, policy_version 31822 (0.0024) [2024-03-29 15:38:18,423][00497] Updated weights for policy 0, policy_version 31832 (0.0024) [2024-03-29 15:38:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 521551872. Throughput: 0: 41613.0. Samples: 403693340. Policy #0 lag: (min: 0.0, avg: 20.6, max: 44.0) [2024-03-29 15:38:18,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 15:38:21,879][00497] Updated weights for policy 0, policy_version 31842 (0.0025) [2024-03-29 15:38:23,840][00126] Fps is (10 sec: 42597.4, 60 sec: 42052.1, 300 sec: 41654.2). Total num frames: 521764864. Throughput: 0: 41995.2. Samples: 403949540. Policy #0 lag: (min: 0.0, avg: 20.6, max: 44.0) [2024-03-29 15:38:23,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 15:38:26,082][00497] Updated weights for policy 0, policy_version 31852 (0.0019) [2024-03-29 15:38:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 521961472. Throughput: 0: 42296.0. Samples: 404212900. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:38:28,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 15:38:29,917][00497] Updated weights for policy 0, policy_version 31862 (0.0021) [2024-03-29 15:38:31,505][00476] Signal inference workers to stop experience collection... (14450 times) [2024-03-29 15:38:31,549][00497] InferenceWorker_p0-w0: stopping experience collection (14450 times) [2024-03-29 15:38:31,679][00476] Signal inference workers to resume experience collection... (14450 times) [2024-03-29 15:38:31,679][00497] InferenceWorker_p0-w0: resuming experience collection (14450 times) [2024-03-29 15:38:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41543.1). Total num frames: 522174464. Throughput: 0: 42036.6. Samples: 404325480. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:38:33,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 15:38:34,118][00497] Updated weights for policy 0, policy_version 31872 (0.0028) [2024-03-29 15:38:37,638][00497] Updated weights for policy 0, policy_version 31882 (0.0020) [2024-03-29 15:38:38,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 522403840. Throughput: 0: 42353.5. Samples: 404580580. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:38:38,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 15:38:41,971][00497] Updated weights for policy 0, policy_version 31892 (0.0019) [2024-03-29 15:38:43,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 522567680. Throughput: 0: 41534.7. Samples: 404825020. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:38:43,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:38:46,124][00497] Updated weights for policy 0, policy_version 31902 (0.0027) [2024-03-29 15:38:48,839][00126] Fps is (10 sec: 40959.2, 60 sec: 42052.1, 300 sec: 41654.2). Total num frames: 522813440. Throughput: 0: 41847.5. Samples: 404950040. Policy #0 lag: (min: 1.0, avg: 21.6, max: 44.0) [2024-03-29 15:38:48,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 15:38:50,008][00497] Updated weights for policy 0, policy_version 31912 (0.0019) [2024-03-29 15:38:53,485][00497] Updated weights for policy 0, policy_version 31922 (0.0027) [2024-03-29 15:38:53,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 523010048. Throughput: 0: 41795.9. Samples: 405198500. Policy #0 lag: (min: 1.0, avg: 21.6, max: 44.0) [2024-03-29 15:38:53,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 15:38:57,597][00497] Updated weights for policy 0, policy_version 31932 (0.0018) [2024-03-29 15:38:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 523223040. Throughput: 0: 41758.7. Samples: 405458500. Policy #0 lag: (min: 1.0, avg: 21.6, max: 44.0) [2024-03-29 15:38:58,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 15:39:01,453][00497] Updated weights for policy 0, policy_version 31942 (0.0023) [2024-03-29 15:39:03,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.2, 300 sec: 41709.8). Total num frames: 523452416. Throughput: 0: 42175.0. Samples: 405591220. Policy #0 lag: (min: 1.0, avg: 21.6, max: 44.0) [2024-03-29 15:39:03,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 15:39:04,044][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031950_523468800.pth... [2024-03-29 15:39:04,356][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031337_513425408.pth [2024-03-29 15:39:05,405][00497] Updated weights for policy 0, policy_version 31952 (0.0024) [2024-03-29 15:39:06,026][00476] Signal inference workers to stop experience collection... (14500 times) [2024-03-29 15:39:06,027][00476] Signal inference workers to resume experience collection... (14500 times) [2024-03-29 15:39:06,072][00497] InferenceWorker_p0-w0: stopping experience collection (14500 times) [2024-03-29 15:39:06,073][00497] InferenceWorker_p0-w0: resuming experience collection (14500 times) [2024-03-29 15:39:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 523649024. Throughput: 0: 41921.1. Samples: 405835980. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 15:39:08,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 15:39:08,883][00497] Updated weights for policy 0, policy_version 31962 (0.0020) [2024-03-29 15:39:13,205][00497] Updated weights for policy 0, policy_version 31972 (0.0022) [2024-03-29 15:39:13,839][00126] Fps is (10 sec: 37684.0, 60 sec: 41506.2, 300 sec: 41487.6). Total num frames: 523829248. Throughput: 0: 41566.2. Samples: 406083380. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 15:39:13,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:39:17,136][00497] Updated weights for policy 0, policy_version 31982 (0.0024) [2024-03-29 15:39:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 524075008. Throughput: 0: 41975.7. Samples: 406214380. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 15:39:18,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 15:39:21,192][00497] Updated weights for policy 0, policy_version 31992 (0.0019) [2024-03-29 15:39:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 524271616. Throughput: 0: 41977.2. Samples: 406469560. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 15:39:23,842][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 15:39:24,671][00497] Updated weights for policy 0, policy_version 32002 (0.0018) [2024-03-29 15:39:28,761][00497] Updated weights for policy 0, policy_version 32012 (0.0021) [2024-03-29 15:39:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 524484608. Throughput: 0: 42085.3. Samples: 406718860. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 15:39:28,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:39:32,523][00497] Updated weights for policy 0, policy_version 32022 (0.0027) [2024-03-29 15:39:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 524697600. Throughput: 0: 42178.7. Samples: 406848080. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:39:33,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 15:39:36,785][00497] Updated weights for policy 0, policy_version 32032 (0.0023) [2024-03-29 15:39:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 524910592. Throughput: 0: 42181.9. Samples: 407096680. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:39:38,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:39:40,340][00476] Signal inference workers to stop experience collection... (14550 times) [2024-03-29 15:39:40,421][00476] Signal inference workers to resume experience collection... (14550 times) [2024-03-29 15:39:40,423][00497] InferenceWorker_p0-w0: stopping experience collection (14550 times) [2024-03-29 15:39:40,426][00497] Updated weights for policy 0, policy_version 32042 (0.0024) [2024-03-29 15:39:40,448][00497] InferenceWorker_p0-w0: resuming experience collection (14550 times) [2024-03-29 15:39:43,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 525107200. Throughput: 0: 41891.5. Samples: 407343620. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:39:43,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 15:39:44,330][00497] Updated weights for policy 0, policy_version 32052 (0.0019) [2024-03-29 15:39:48,314][00497] Updated weights for policy 0, policy_version 32062 (0.0021) [2024-03-29 15:39:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 525320192. Throughput: 0: 41739.7. Samples: 407469500. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 15:39:48,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:39:52,503][00497] Updated weights for policy 0, policy_version 32072 (0.0020) [2024-03-29 15:39:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 525533184. Throughput: 0: 42058.2. Samples: 407728600. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 15:39:53,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:39:56,050][00497] Updated weights for policy 0, policy_version 32082 (0.0027) [2024-03-29 15:39:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 525746176. Throughput: 0: 42026.1. Samples: 407974560. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 15:39:58,842][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 15:39:59,912][00497] Updated weights for policy 0, policy_version 32092 (0.0022) [2024-03-29 15:40:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.3, 300 sec: 41765.3). Total num frames: 525942784. Throughput: 0: 42071.7. Samples: 408107600. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 15:40:03,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 15:40:03,898][00497] Updated weights for policy 0, policy_version 32102 (0.0021) [2024-03-29 15:40:07,966][00497] Updated weights for policy 0, policy_version 32112 (0.0020) [2024-03-29 15:40:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 526155776. Throughput: 0: 41628.5. Samples: 408342840. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 15:40:08,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 15:40:11,772][00497] Updated weights for policy 0, policy_version 32122 (0.0019) [2024-03-29 15:40:12,292][00476] Signal inference workers to stop experience collection... (14600 times) [2024-03-29 15:40:12,366][00497] InferenceWorker_p0-w0: stopping experience collection (14600 times) [2024-03-29 15:40:12,381][00476] Signal inference workers to resume experience collection... (14600 times) [2024-03-29 15:40:12,395][00497] InferenceWorker_p0-w0: resuming experience collection (14600 times) [2024-03-29 15:40:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 526368768. Throughput: 0: 41661.4. Samples: 408593620. Policy #0 lag: (min: 2.0, avg: 20.8, max: 42.0) [2024-03-29 15:40:13,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 15:40:15,537][00497] Updated weights for policy 0, policy_version 32132 (0.0019) [2024-03-29 15:40:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 526581760. Throughput: 0: 41974.8. Samples: 408736940. Policy #0 lag: (min: 2.0, avg: 20.8, max: 42.0) [2024-03-29 15:40:18,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 15:40:19,361][00497] Updated weights for policy 0, policy_version 32142 (0.0020) [2024-03-29 15:40:23,256][00497] Updated weights for policy 0, policy_version 32152 (0.0027) [2024-03-29 15:40:23,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 526794752. Throughput: 0: 41930.2. Samples: 408983540. Policy #0 lag: (min: 2.0, avg: 20.8, max: 42.0) [2024-03-29 15:40:23,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:40:27,126][00497] Updated weights for policy 0, policy_version 32162 (0.0027) [2024-03-29 15:40:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 527007744. Throughput: 0: 41935.1. Samples: 409230700. Policy #0 lag: (min: 2.0, avg: 20.8, max: 42.0) [2024-03-29 15:40:28,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 15:40:31,017][00497] Updated weights for policy 0, policy_version 32172 (0.0023) [2024-03-29 15:40:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 527220736. Throughput: 0: 42251.5. Samples: 409370820. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:40:33,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 15:40:34,971][00497] Updated weights for policy 0, policy_version 32182 (0.0028) [2024-03-29 15:40:38,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 527417344. Throughput: 0: 41955.9. Samples: 409616620. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:40:38,841][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 15:40:38,898][00497] Updated weights for policy 0, policy_version 32192 (0.0020) [2024-03-29 15:40:42,795][00497] Updated weights for policy 0, policy_version 32202 (0.0029) [2024-03-29 15:40:43,344][00476] Signal inference workers to stop experience collection... (14650 times) [2024-03-29 15:40:43,365][00497] InferenceWorker_p0-w0: stopping experience collection (14650 times) [2024-03-29 15:40:43,531][00476] Signal inference workers to resume experience collection... (14650 times) [2024-03-29 15:40:43,532][00497] InferenceWorker_p0-w0: resuming experience collection (14650 times) [2024-03-29 15:40:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 527646720. Throughput: 0: 42044.9. Samples: 409866580. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:40:43,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:40:46,661][00497] Updated weights for policy 0, policy_version 32212 (0.0030) [2024-03-29 15:40:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 527843328. Throughput: 0: 41852.0. Samples: 409990940. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:40:48,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 15:40:50,545][00497] Updated weights for policy 0, policy_version 32222 (0.0019) [2024-03-29 15:40:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 528056320. Throughput: 0: 42416.8. Samples: 410251600. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 15:40:53,841][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 15:40:54,444][00497] Updated weights for policy 0, policy_version 32232 (0.0026) [2024-03-29 15:40:58,331][00497] Updated weights for policy 0, policy_version 32242 (0.0034) [2024-03-29 15:40:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 528269312. Throughput: 0: 42478.6. Samples: 410505160. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:40:58,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 15:41:02,298][00497] Updated weights for policy 0, policy_version 32252 (0.0019) [2024-03-29 15:41:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 528482304. Throughput: 0: 42006.5. Samples: 410627240. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:41:03,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 15:41:03,878][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000032257_528498688.pth... [2024-03-29 15:41:04,186][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031644_518455296.pth [2024-03-29 15:41:06,166][00497] Updated weights for policy 0, policy_version 32262 (0.0028) [2024-03-29 15:41:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42043.1). Total num frames: 528695296. Throughput: 0: 42056.1. Samples: 410876060. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:41:08,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 15:41:09,849][00497] Updated weights for policy 0, policy_version 32272 (0.0025) [2024-03-29 15:41:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 528891904. Throughput: 0: 42429.3. Samples: 411140020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 15:41:13,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:41:13,865][00497] Updated weights for policy 0, policy_version 32282 (0.0028) [2024-03-29 15:41:14,137][00476] Signal inference workers to stop experience collection... (14700 times) [2024-03-29 15:41:14,173][00497] InferenceWorker_p0-w0: stopping experience collection (14700 times) [2024-03-29 15:41:14,362][00476] Signal inference workers to resume experience collection... (14700 times) [2024-03-29 15:41:14,363][00497] InferenceWorker_p0-w0: resuming experience collection (14700 times) [2024-03-29 15:41:17,729][00497] Updated weights for policy 0, policy_version 32292 (0.0020) [2024-03-29 15:41:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 529104896. Throughput: 0: 42011.6. Samples: 411261340. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 15:41:18,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 15:41:21,808][00497] Updated weights for policy 0, policy_version 32302 (0.0020) [2024-03-29 15:41:23,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 529334272. Throughput: 0: 42053.0. Samples: 411509000. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 15:41:23,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:41:25,633][00497] Updated weights for policy 0, policy_version 32312 (0.0020) [2024-03-29 15:41:28,843][00126] Fps is (10 sec: 42584.2, 60 sec: 42049.9, 300 sec: 41875.9). Total num frames: 529530880. Throughput: 0: 42093.3. Samples: 411760920. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 15:41:28,844][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 15:41:29,568][00497] Updated weights for policy 0, policy_version 32322 (0.0021) [2024-03-29 15:41:33,398][00497] Updated weights for policy 0, policy_version 32332 (0.0019) [2024-03-29 15:41:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41987.7). Total num frames: 529743872. Throughput: 0: 41984.8. Samples: 411880260. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 15:41:33,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:41:37,487][00497] Updated weights for policy 0, policy_version 32342 (0.0032) [2024-03-29 15:41:38,839][00126] Fps is (10 sec: 42612.3, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 529956864. Throughput: 0: 42163.6. Samples: 412148960. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 15:41:38,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:41:41,229][00497] Updated weights for policy 0, policy_version 32352 (0.0027) [2024-03-29 15:41:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 530153472. Throughput: 0: 41865.3. Samples: 412389100. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 15:41:43,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 15:41:45,288][00497] Updated weights for policy 0, policy_version 32362 (0.0022) [2024-03-29 15:41:48,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 530350080. Throughput: 0: 41834.8. Samples: 412509800. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 15:41:48,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 15:41:49,198][00497] Updated weights for policy 0, policy_version 32372 (0.0030) [2024-03-29 15:41:50,729][00476] Signal inference workers to stop experience collection... (14750 times) [2024-03-29 15:41:50,804][00476] Signal inference workers to resume experience collection... (14750 times) [2024-03-29 15:41:50,807][00497] InferenceWorker_p0-w0: stopping experience collection (14750 times) [2024-03-29 15:41:50,832][00497] InferenceWorker_p0-w0: resuming experience collection (14750 times) [2024-03-29 15:41:53,107][00497] Updated weights for policy 0, policy_version 32382 (0.0017) [2024-03-29 15:41:53,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 530579456. Throughput: 0: 42070.3. Samples: 412769220. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 15:41:53,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:41:57,011][00497] Updated weights for policy 0, policy_version 32392 (0.0026) [2024-03-29 15:41:58,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 530792448. Throughput: 0: 41714.6. Samples: 413017180. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 15:41:58,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 15:42:00,978][00497] Updated weights for policy 0, policy_version 32402 (0.0022) [2024-03-29 15:42:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 531005440. Throughput: 0: 41855.5. Samples: 413144840. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 15:42:03,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 15:42:04,785][00497] Updated weights for policy 0, policy_version 32412 (0.0029) [2024-03-29 15:42:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 531185664. Throughput: 0: 41775.9. Samples: 413388920. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 15:42:08,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 15:42:08,875][00497] Updated weights for policy 0, policy_version 32422 (0.0022) [2024-03-29 15:42:12,570][00497] Updated weights for policy 0, policy_version 32432 (0.0021) [2024-03-29 15:42:13,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 531398656. Throughput: 0: 41907.0. Samples: 413646600. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 15:42:13,840][00126] Avg episode reward: [(0, '0.506')] [2024-03-29 15:42:16,501][00497] Updated weights for policy 0, policy_version 32442 (0.0021) [2024-03-29 15:42:18,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 531628032. Throughput: 0: 42141.9. Samples: 413776640. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 15:42:18,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 15:42:20,256][00497] Updated weights for policy 0, policy_version 32452 (0.0023) [2024-03-29 15:42:23,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 531824640. Throughput: 0: 41817.3. Samples: 414030740. Policy #0 lag: (min: 1.0, avg: 22.5, max: 43.0) [2024-03-29 15:42:23,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 15:42:24,227][00497] Updated weights for policy 0, policy_version 32462 (0.0018) [2024-03-29 15:42:25,504][00476] Signal inference workers to stop experience collection... (14800 times) [2024-03-29 15:42:25,507][00476] Signal inference workers to resume experience collection... (14800 times) [2024-03-29 15:42:25,549][00497] InferenceWorker_p0-w0: stopping experience collection (14800 times) [2024-03-29 15:42:25,553][00497] InferenceWorker_p0-w0: resuming experience collection (14800 times) [2024-03-29 15:42:27,855][00497] Updated weights for policy 0, policy_version 32472 (0.0020) [2024-03-29 15:42:28,839][00126] Fps is (10 sec: 40959.0, 60 sec: 41781.4, 300 sec: 41931.9). Total num frames: 532037632. Throughput: 0: 42068.3. Samples: 414282180. Policy #0 lag: (min: 1.0, avg: 22.5, max: 43.0) [2024-03-29 15:42:28,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 15:42:32,134][00497] Updated weights for policy 0, policy_version 32482 (0.0020) [2024-03-29 15:42:33,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 41987.4). Total num frames: 532267008. Throughput: 0: 42242.5. Samples: 414410720. Policy #0 lag: (min: 1.0, avg: 22.5, max: 43.0) [2024-03-29 15:42:33,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 15:42:35,681][00497] Updated weights for policy 0, policy_version 32492 (0.0023) [2024-03-29 15:42:38,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 532463616. Throughput: 0: 42016.8. Samples: 414659980. Policy #0 lag: (min: 1.0, avg: 22.5, max: 43.0) [2024-03-29 15:42:38,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 15:42:39,596][00497] Updated weights for policy 0, policy_version 32502 (0.0019) [2024-03-29 15:42:43,512][00497] Updated weights for policy 0, policy_version 32512 (0.0021) [2024-03-29 15:42:43,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 532676608. Throughput: 0: 42142.3. Samples: 414913580. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:42:43,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:42:47,597][00497] Updated weights for policy 0, policy_version 32522 (0.0027) [2024-03-29 15:42:48,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 532889600. Throughput: 0: 42113.7. Samples: 415039960. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:42:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 15:42:51,373][00497] Updated weights for policy 0, policy_version 32532 (0.0024) [2024-03-29 15:42:53,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.1, 300 sec: 42043.0). Total num frames: 533102592. Throughput: 0: 42265.3. Samples: 415290860. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:42:53,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 15:42:55,135][00497] Updated weights for policy 0, policy_version 32542 (0.0023) [2024-03-29 15:42:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 533315584. Throughput: 0: 42020.1. Samples: 415537500. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:42:58,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 15:42:59,361][00497] Updated weights for policy 0, policy_version 32552 (0.0022) [2024-03-29 15:43:00,597][00476] Signal inference workers to stop experience collection... (14850 times) [2024-03-29 15:43:00,647][00497] InferenceWorker_p0-w0: stopping experience collection (14850 times) [2024-03-29 15:43:00,760][00476] Signal inference workers to resume experience collection... (14850 times) [2024-03-29 15:43:00,761][00497] InferenceWorker_p0-w0: resuming experience collection (14850 times) [2024-03-29 15:43:03,150][00497] Updated weights for policy 0, policy_version 32562 (0.0023) [2024-03-29 15:43:03,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 533512192. Throughput: 0: 42121.3. Samples: 415672100. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:43:03,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 15:43:04,143][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000032565_533544960.pth... [2024-03-29 15:43:04,458][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000031950_523468800.pth [2024-03-29 15:43:07,008][00497] Updated weights for policy 0, policy_version 32572 (0.0019) [2024-03-29 15:43:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 533741568. Throughput: 0: 42020.8. Samples: 415921680. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:43:08,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 15:43:10,687][00497] Updated weights for policy 0, policy_version 32582 (0.0023) [2024-03-29 15:43:13,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 533954560. Throughput: 0: 42005.8. Samples: 416172440. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:43:13,841][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:43:14,604][00497] Updated weights for policy 0, policy_version 32592 (0.0031) [2024-03-29 15:43:18,710][00497] Updated weights for policy 0, policy_version 32602 (0.0021) [2024-03-29 15:43:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 534151168. Throughput: 0: 42186.8. Samples: 416309120. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:43:18,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 15:43:22,325][00497] Updated weights for policy 0, policy_version 32612 (0.0023) [2024-03-29 15:43:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 534364160. Throughput: 0: 42105.2. Samples: 416554720. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 15:43:23,840][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 15:43:26,261][00497] Updated weights for policy 0, policy_version 32622 (0.0023) [2024-03-29 15:43:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 534577152. Throughput: 0: 42014.2. Samples: 416804220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 15:43:28,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 15:43:29,929][00497] Updated weights for policy 0, policy_version 32632 (0.0024) [2024-03-29 15:43:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 534773760. Throughput: 0: 42320.0. Samples: 416944360. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 15:43:33,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 15:43:34,355][00497] Updated weights for policy 0, policy_version 32642 (0.0017) [2024-03-29 15:43:37,617][00497] Updated weights for policy 0, policy_version 32652 (0.0024) [2024-03-29 15:43:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 534986752. Throughput: 0: 42180.5. Samples: 417188980. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 15:43:38,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 15:43:38,871][00476] Signal inference workers to stop experience collection... (14900 times) [2024-03-29 15:43:38,946][00476] Signal inference workers to resume experience collection... (14900 times) [2024-03-29 15:43:38,948][00497] InferenceWorker_p0-w0: stopping experience collection (14900 times) [2024-03-29 15:43:38,972][00497] InferenceWorker_p0-w0: resuming experience collection (14900 times) [2024-03-29 15:43:41,671][00497] Updated weights for policy 0, policy_version 32662 (0.0020) [2024-03-29 15:43:43,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 535232512. Throughput: 0: 42420.9. Samples: 417446440. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 15:43:43,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:43:45,350][00497] Updated weights for policy 0, policy_version 32672 (0.0032) [2024-03-29 15:43:48,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 535429120. Throughput: 0: 42213.6. Samples: 417571720. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 15:43:48,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 15:43:49,691][00497] Updated weights for policy 0, policy_version 32682 (0.0023) [2024-03-29 15:43:53,147][00497] Updated weights for policy 0, policy_version 32692 (0.0036) [2024-03-29 15:43:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 535625728. Throughput: 0: 42249.8. Samples: 417822920. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 15:43:53,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 15:43:56,936][00497] Updated weights for policy 0, policy_version 32702 (0.0024) [2024-03-29 15:43:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 535871488. Throughput: 0: 42546.3. Samples: 418087020. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 15:43:58,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:44:00,905][00497] Updated weights for policy 0, policy_version 32712 (0.0024) [2024-03-29 15:44:03,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 536068096. Throughput: 0: 42233.3. Samples: 418209620. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 15:44:03,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 15:44:05,022][00497] Updated weights for policy 0, policy_version 32722 (0.0023) [2024-03-29 15:44:08,737][00497] Updated weights for policy 0, policy_version 32732 (0.0039) [2024-03-29 15:44:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 536281088. Throughput: 0: 42326.7. Samples: 418459420. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 15:44:08,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 15:44:12,616][00497] Updated weights for policy 0, policy_version 32742 (0.0021) [2024-03-29 15:44:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 536477696. Throughput: 0: 42422.8. Samples: 418713240. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 15:44:13,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 15:44:16,077][00476] Signal inference workers to stop experience collection... (14950 times) [2024-03-29 15:44:16,154][00476] Signal inference workers to resume experience collection... (14950 times) [2024-03-29 15:44:16,158][00497] InferenceWorker_p0-w0: stopping experience collection (14950 times) [2024-03-29 15:44:16,179][00497] InferenceWorker_p0-w0: resuming experience collection (14950 times) [2024-03-29 15:44:16,467][00497] Updated weights for policy 0, policy_version 32752 (0.0018) [2024-03-29 15:44:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 536690688. Throughput: 0: 42011.5. Samples: 418834880. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 15:44:18,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:44:20,755][00497] Updated weights for policy 0, policy_version 32762 (0.0026) [2024-03-29 15:44:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.5, 300 sec: 42098.5). Total num frames: 536903680. Throughput: 0: 42243.6. Samples: 419089940. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 15:44:23,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:44:24,256][00497] Updated weights for policy 0, policy_version 32772 (0.0023) [2024-03-29 15:44:28,128][00497] Updated weights for policy 0, policy_version 32782 (0.0024) [2024-03-29 15:44:28,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 537116672. Throughput: 0: 42176.1. Samples: 419344360. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:44:28,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 15:44:31,967][00497] Updated weights for policy 0, policy_version 32792 (0.0022) [2024-03-29 15:44:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 537313280. Throughput: 0: 42152.1. Samples: 419468560. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:44:33,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:44:36,441][00497] Updated weights for policy 0, policy_version 32802 (0.0020) [2024-03-29 15:44:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 537542656. Throughput: 0: 42426.3. Samples: 419732100. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:44:38,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 15:44:39,852][00497] Updated weights for policy 0, policy_version 32812 (0.0024) [2024-03-29 15:44:43,754][00497] Updated weights for policy 0, policy_version 32822 (0.0021) [2024-03-29 15:44:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 537755648. Throughput: 0: 41832.8. Samples: 419969500. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:44:43,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 15:44:47,678][00497] Updated weights for policy 0, policy_version 32832 (0.0026) [2024-03-29 15:44:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 537952256. Throughput: 0: 41907.6. Samples: 420095460. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 15:44:48,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 15:44:51,861][00497] Updated weights for policy 0, policy_version 32842 (0.0018) [2024-03-29 15:44:53,760][00476] Signal inference workers to stop experience collection... (15000 times) [2024-03-29 15:44:53,824][00497] InferenceWorker_p0-w0: stopping experience collection (15000 times) [2024-03-29 15:44:53,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 538181632. Throughput: 0: 42429.9. Samples: 420368760. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:44:53,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 15:44:53,951][00476] Signal inference workers to resume experience collection... (15000 times) [2024-03-29 15:44:53,952][00497] InferenceWorker_p0-w0: resuming experience collection (15000 times) [2024-03-29 15:44:55,351][00497] Updated weights for policy 0, policy_version 32852 (0.0019) [2024-03-29 15:44:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 538394624. Throughput: 0: 42117.4. Samples: 420608520. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:44:58,841][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:44:59,147][00497] Updated weights for policy 0, policy_version 32862 (0.0025) [2024-03-29 15:45:03,290][00497] Updated weights for policy 0, policy_version 32872 (0.0020) [2024-03-29 15:45:03,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 538591232. Throughput: 0: 42047.6. Samples: 420727020. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:45:03,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 15:45:03,866][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000032873_538591232.pth... [2024-03-29 15:45:04,192][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000032257_528498688.pth [2024-03-29 15:45:07,523][00497] Updated weights for policy 0, policy_version 32882 (0.0034) [2024-03-29 15:45:08,839][00126] Fps is (10 sec: 40959.2, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 538804224. Throughput: 0: 42259.0. Samples: 420991600. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 15:45:08,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 15:45:11,059][00497] Updated weights for policy 0, policy_version 32892 (0.0024) [2024-03-29 15:45:13,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 539017216. Throughput: 0: 42052.0. Samples: 421236700. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:45:13,840][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 15:45:15,100][00497] Updated weights for policy 0, policy_version 32902 (0.0022) [2024-03-29 15:45:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 539213824. Throughput: 0: 41943.1. Samples: 421356000. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:45:18,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:45:18,964][00497] Updated weights for policy 0, policy_version 32912 (0.0023) [2024-03-29 15:45:23,304][00497] Updated weights for policy 0, policy_version 32922 (0.0022) [2024-03-29 15:45:23,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 539394048. Throughput: 0: 41716.1. Samples: 421609320. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:45:23,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 15:45:25,438][00476] Signal inference workers to stop experience collection... (15050 times) [2024-03-29 15:45:25,508][00497] InferenceWorker_p0-w0: stopping experience collection (15050 times) [2024-03-29 15:45:25,521][00476] Signal inference workers to resume experience collection... (15050 times) [2024-03-29 15:45:25,536][00497] InferenceWorker_p0-w0: resuming experience collection (15050 times) [2024-03-29 15:45:26,728][00497] Updated weights for policy 0, policy_version 32932 (0.0029) [2024-03-29 15:45:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 539623424. Throughput: 0: 42192.5. Samples: 421868160. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 15:45:28,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:45:30,756][00497] Updated weights for policy 0, policy_version 32942 (0.0019) [2024-03-29 15:45:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 539836416. Throughput: 0: 41834.5. Samples: 421978020. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 15:45:33,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 15:45:34,591][00497] Updated weights for policy 0, policy_version 32952 (0.0029) [2024-03-29 15:45:38,788][00497] Updated weights for policy 0, policy_version 32962 (0.0026) [2024-03-29 15:45:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 540049408. Throughput: 0: 41578.1. Samples: 422239780. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 15:45:38,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 15:45:42,171][00497] Updated weights for policy 0, policy_version 32972 (0.0029) [2024-03-29 15:45:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 540246016. Throughput: 0: 41729.2. Samples: 422486340. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 15:45:43,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 15:45:46,264][00497] Updated weights for policy 0, policy_version 32982 (0.0028) [2024-03-29 15:45:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 540459008. Throughput: 0: 41813.0. Samples: 422608600. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 15:45:48,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 15:45:50,322][00497] Updated weights for policy 0, policy_version 32992 (0.0020) [2024-03-29 15:45:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 540655616. Throughput: 0: 41636.1. Samples: 422865220. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 15:45:53,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 15:45:54,544][00497] Updated weights for policy 0, policy_version 33002 (0.0017) [2024-03-29 15:45:57,552][00497] Updated weights for policy 0, policy_version 33012 (0.0032) [2024-03-29 15:45:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.0, 300 sec: 42043.0). Total num frames: 540884992. Throughput: 0: 41705.2. Samples: 423113440. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:45:58,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 15:46:00,165][00476] Signal inference workers to stop experience collection... (15100 times) [2024-03-29 15:46:00,165][00476] Signal inference workers to resume experience collection... (15100 times) [2024-03-29 15:46:00,203][00497] InferenceWorker_p0-w0: stopping experience collection (15100 times) [2024-03-29 15:46:00,204][00497] InferenceWorker_p0-w0: resuming experience collection (15100 times) [2024-03-29 15:46:01,832][00497] Updated weights for policy 0, policy_version 33022 (0.0030) [2024-03-29 15:46:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 541097984. Throughput: 0: 42081.2. Samples: 423249660. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:46:03,841][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 15:46:05,707][00497] Updated weights for policy 0, policy_version 33032 (0.0028) [2024-03-29 15:46:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 541310976. Throughput: 0: 42138.7. Samples: 423505560. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:46:08,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:46:09,916][00497] Updated weights for policy 0, policy_version 33042 (0.0017) [2024-03-29 15:46:13,253][00497] Updated weights for policy 0, policy_version 33052 (0.0027) [2024-03-29 15:46:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 541523968. Throughput: 0: 41822.6. Samples: 423750180. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 15:46:13,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 15:46:17,538][00497] Updated weights for policy 0, policy_version 33062 (0.0030) [2024-03-29 15:46:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 541736960. Throughput: 0: 42291.2. Samples: 423881120. Policy #0 lag: (min: 0.0, avg: 19.8, max: 42.0) [2024-03-29 15:46:18,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 15:46:21,272][00497] Updated weights for policy 0, policy_version 33072 (0.0021) [2024-03-29 15:46:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42043.5). Total num frames: 541933568. Throughput: 0: 41999.5. Samples: 424129760. Policy #0 lag: (min: 0.0, avg: 19.8, max: 42.0) [2024-03-29 15:46:23,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:46:25,699][00497] Updated weights for policy 0, policy_version 33082 (0.0023) [2024-03-29 15:46:28,699][00497] Updated weights for policy 0, policy_version 33092 (0.0028) [2024-03-29 15:46:28,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 542179328. Throughput: 0: 41958.6. Samples: 424374480. Policy #0 lag: (min: 0.0, avg: 19.8, max: 42.0) [2024-03-29 15:46:28,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 15:46:33,205][00497] Updated weights for policy 0, policy_version 33102 (0.0019) [2024-03-29 15:46:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 542359552. Throughput: 0: 42347.9. Samples: 424514260. Policy #0 lag: (min: 0.0, avg: 19.8, max: 42.0) [2024-03-29 15:46:33,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:46:33,961][00476] Signal inference workers to stop experience collection... (15150 times) [2024-03-29 15:46:33,992][00497] InferenceWorker_p0-w0: stopping experience collection (15150 times) [2024-03-29 15:46:34,155][00476] Signal inference workers to resume experience collection... (15150 times) [2024-03-29 15:46:34,156][00497] InferenceWorker_p0-w0: resuming experience collection (15150 times) [2024-03-29 15:46:36,950][00497] Updated weights for policy 0, policy_version 33112 (0.0026) [2024-03-29 15:46:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 542572544. Throughput: 0: 41940.4. Samples: 424752540. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 15:46:38,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:46:41,495][00497] Updated weights for policy 0, policy_version 33122 (0.0023) [2024-03-29 15:46:43,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 542785536. Throughput: 0: 42109.0. Samples: 425008340. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 15:46:43,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 15:46:44,538][00497] Updated weights for policy 0, policy_version 33132 (0.0025) [2024-03-29 15:46:48,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 542982144. Throughput: 0: 41857.0. Samples: 425133220. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 15:46:48,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 15:46:48,921][00497] Updated weights for policy 0, policy_version 33142 (0.0024) [2024-03-29 15:46:52,902][00497] Updated weights for policy 0, policy_version 33152 (0.0027) [2024-03-29 15:46:53,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 543178752. Throughput: 0: 41553.7. Samples: 425375480. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 15:46:53,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 15:46:57,034][00497] Updated weights for policy 0, policy_version 33162 (0.0025) [2024-03-29 15:46:58,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 543408128. Throughput: 0: 42000.5. Samples: 425640200. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 15:46:58,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 15:47:00,426][00497] Updated weights for policy 0, policy_version 33172 (0.0019) [2024-03-29 15:47:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 543604736. Throughput: 0: 41484.7. Samples: 425747940. Policy #0 lag: (min: 2.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:03,841][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 15:47:04,035][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000033180_543621120.pth... [2024-03-29 15:47:04,336][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000032565_533544960.pth [2024-03-29 15:47:04,898][00497] Updated weights for policy 0, policy_version 33182 (0.0026) [2024-03-29 15:47:05,912][00476] Signal inference workers to stop experience collection... (15200 times) [2024-03-29 15:47:05,992][00497] InferenceWorker_p0-w0: stopping experience collection (15200 times) [2024-03-29 15:47:05,993][00476] Signal inference workers to resume experience collection... (15200 times) [2024-03-29 15:47:06,019][00497] InferenceWorker_p0-w0: resuming experience collection (15200 times) [2024-03-29 15:47:08,438][00497] Updated weights for policy 0, policy_version 33192 (0.0022) [2024-03-29 15:47:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 543817728. Throughput: 0: 41744.5. Samples: 426008260. Policy #0 lag: (min: 2.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:08,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 15:47:12,785][00497] Updated weights for policy 0, policy_version 33202 (0.0025) [2024-03-29 15:47:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 544014336. Throughput: 0: 41966.8. Samples: 426262980. Policy #0 lag: (min: 2.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:13,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 15:47:16,176][00497] Updated weights for policy 0, policy_version 33212 (0.0033) [2024-03-29 15:47:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 544227328. Throughput: 0: 41266.3. Samples: 426371240. Policy #0 lag: (min: 2.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:18,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 15:47:20,641][00497] Updated weights for policy 0, policy_version 33222 (0.0022) [2024-03-29 15:47:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 544440320. Throughput: 0: 41562.8. Samples: 426622860. Policy #0 lag: (min: 1.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:23,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 15:47:24,248][00497] Updated weights for policy 0, policy_version 33232 (0.0030) [2024-03-29 15:47:28,543][00497] Updated weights for policy 0, policy_version 33242 (0.0023) [2024-03-29 15:47:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40960.1, 300 sec: 41932.0). Total num frames: 544636928. Throughput: 0: 41628.0. Samples: 426881600. Policy #0 lag: (min: 1.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:28,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 15:47:31,961][00497] Updated weights for policy 0, policy_version 33252 (0.0017) [2024-03-29 15:47:33,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.2, 300 sec: 41931.9). Total num frames: 544833536. Throughput: 0: 41200.4. Samples: 426987240. Policy #0 lag: (min: 1.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:33,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 15:47:36,017][00476] Signal inference workers to stop experience collection... (15250 times) [2024-03-29 15:47:36,095][00497] InferenceWorker_p0-w0: stopping experience collection (15250 times) [2024-03-29 15:47:36,195][00476] Signal inference workers to resume experience collection... (15250 times) [2024-03-29 15:47:36,196][00497] InferenceWorker_p0-w0: resuming experience collection (15250 times) [2024-03-29 15:47:36,508][00497] Updated weights for policy 0, policy_version 33262 (0.0019) [2024-03-29 15:47:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 545062912. Throughput: 0: 41682.7. Samples: 427251200. Policy #0 lag: (min: 1.0, avg: 21.7, max: 43.0) [2024-03-29 15:47:38,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 15:47:40,152][00497] Updated weights for policy 0, policy_version 33272 (0.0018) [2024-03-29 15:47:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40960.0, 300 sec: 41876.4). Total num frames: 545243136. Throughput: 0: 41186.3. Samples: 427493580. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:47:43,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 15:47:44,722][00497] Updated weights for policy 0, policy_version 33282 (0.0027) [2024-03-29 15:47:48,157][00497] Updated weights for policy 0, policy_version 33292 (0.0025) [2024-03-29 15:47:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 545472512. Throughput: 0: 41563.7. Samples: 427618300. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:47:48,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 15:47:52,497][00497] Updated weights for policy 0, policy_version 33302 (0.0021) [2024-03-29 15:47:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 545669120. Throughput: 0: 41096.5. Samples: 427857600. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:47:53,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:47:56,306][00497] Updated weights for policy 0, policy_version 33312 (0.0019) [2024-03-29 15:47:58,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 545865728. Throughput: 0: 41061.4. Samples: 428110740. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:47:58,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:48:00,283][00497] Updated weights for policy 0, policy_version 33322 (0.0028) [2024-03-29 15:48:03,591][00497] Updated weights for policy 0, policy_version 33332 (0.0023) [2024-03-29 15:48:03,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 546111488. Throughput: 0: 41754.6. Samples: 428250200. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 15:48:03,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 15:48:07,274][00476] Signal inference workers to stop experience collection... (15300 times) [2024-03-29 15:48:07,305][00497] InferenceWorker_p0-w0: stopping experience collection (15300 times) [2024-03-29 15:48:07,476][00476] Signal inference workers to resume experience collection... (15300 times) [2024-03-29 15:48:07,476][00497] InferenceWorker_p0-w0: resuming experience collection (15300 times) [2024-03-29 15:48:07,774][00497] Updated weights for policy 0, policy_version 33342 (0.0027) [2024-03-29 15:48:08,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 546308096. Throughput: 0: 41535.9. Samples: 428491980. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 15:48:08,841][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 15:48:11,766][00497] Updated weights for policy 0, policy_version 33352 (0.0018) [2024-03-29 15:48:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 546521088. Throughput: 0: 41449.7. Samples: 428746840. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 15:48:13,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 15:48:15,856][00497] Updated weights for policy 0, policy_version 33362 (0.0027) [2024-03-29 15:48:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 546734080. Throughput: 0: 41956.9. Samples: 428875300. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 15:48:18,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 15:48:19,325][00497] Updated weights for policy 0, policy_version 33372 (0.0026) [2024-03-29 15:48:23,430][00497] Updated weights for policy 0, policy_version 33382 (0.0027) [2024-03-29 15:48:23,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 546930688. Throughput: 0: 41590.3. Samples: 429122760. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 15:48:23,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 15:48:27,364][00497] Updated weights for policy 0, policy_version 33392 (0.0025) [2024-03-29 15:48:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 547143680. Throughput: 0: 41696.9. Samples: 429369940. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:48:28,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:48:31,452][00497] Updated weights for policy 0, policy_version 33402 (0.0020) [2024-03-29 15:48:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 547356672. Throughput: 0: 42065.3. Samples: 429511240. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:48:33,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 15:48:35,214][00497] Updated weights for policy 0, policy_version 33412 (0.0026) [2024-03-29 15:48:37,392][00476] Signal inference workers to stop experience collection... (15350 times) [2024-03-29 15:48:37,423][00497] InferenceWorker_p0-w0: stopping experience collection (15350 times) [2024-03-29 15:48:37,591][00476] Signal inference workers to resume experience collection... (15350 times) [2024-03-29 15:48:37,592][00497] InferenceWorker_p0-w0: resuming experience collection (15350 times) [2024-03-29 15:48:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 547569664. Throughput: 0: 42164.8. Samples: 429755020. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:48:38,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 15:48:39,303][00497] Updated weights for policy 0, policy_version 33422 (0.0030) [2024-03-29 15:48:43,423][00497] Updated weights for policy 0, policy_version 33432 (0.0022) [2024-03-29 15:48:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 547766272. Throughput: 0: 41959.6. Samples: 429998920. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:48:43,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 15:48:47,327][00497] Updated weights for policy 0, policy_version 33442 (0.0032) [2024-03-29 15:48:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 547962880. Throughput: 0: 41615.6. Samples: 430122900. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:48:48,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 15:48:51,031][00497] Updated weights for policy 0, policy_version 33452 (0.0023) [2024-03-29 15:48:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 548192256. Throughput: 0: 42065.8. Samples: 430384940. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 15:48:53,842][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 15:48:55,003][00497] Updated weights for policy 0, policy_version 33462 (0.0026) [2024-03-29 15:48:58,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 548388864. Throughput: 0: 41806.8. Samples: 430628140. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 15:48:58,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 15:48:59,091][00497] Updated weights for policy 0, policy_version 33472 (0.0018) [2024-03-29 15:49:02,708][00497] Updated weights for policy 0, policy_version 33482 (0.0019) [2024-03-29 15:49:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 548601856. Throughput: 0: 41792.9. Samples: 430755980. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 15:49:03,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:49:04,054][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000033485_548618240.pth... [2024-03-29 15:49:04,363][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000032873_538591232.pth [2024-03-29 15:49:06,783][00497] Updated weights for policy 0, policy_version 33492 (0.0019) [2024-03-29 15:49:08,088][00476] Signal inference workers to stop experience collection... (15400 times) [2024-03-29 15:49:08,123][00497] InferenceWorker_p0-w0: stopping experience collection (15400 times) [2024-03-29 15:49:08,316][00476] Signal inference workers to resume experience collection... (15400 times) [2024-03-29 15:49:08,316][00497] InferenceWorker_p0-w0: resuming experience collection (15400 times) [2024-03-29 15:49:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 548798464. Throughput: 0: 41755.6. Samples: 431001760. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 15:49:08,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:49:10,696][00497] Updated weights for policy 0, policy_version 33502 (0.0020) [2024-03-29 15:49:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 549011456. Throughput: 0: 41778.5. Samples: 431249980. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 15:49:13,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 15:49:14,682][00497] Updated weights for policy 0, policy_version 33512 (0.0022) [2024-03-29 15:49:18,418][00497] Updated weights for policy 0, policy_version 33522 (0.0019) [2024-03-29 15:49:18,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 549240832. Throughput: 0: 41436.0. Samples: 431375860. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 15:49:18,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:49:22,195][00497] Updated weights for policy 0, policy_version 33532 (0.0021) [2024-03-29 15:49:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 549437440. Throughput: 0: 41791.6. Samples: 431635640. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 15:49:23,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 15:49:26,065][00497] Updated weights for policy 0, policy_version 33542 (0.0024) [2024-03-29 15:49:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 549650432. Throughput: 0: 41943.1. Samples: 431886360. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 15:49:28,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 15:49:29,960][00497] Updated weights for policy 0, policy_version 33552 (0.0026) [2024-03-29 15:49:33,812][00497] Updated weights for policy 0, policy_version 33562 (0.0021) [2024-03-29 15:49:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 549879808. Throughput: 0: 42070.8. Samples: 432016080. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 15:49:33,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 15:49:37,627][00497] Updated weights for policy 0, policy_version 33572 (0.0018) [2024-03-29 15:49:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 550060032. Throughput: 0: 41900.0. Samples: 432270440. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 15:49:38,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 15:49:39,701][00476] Signal inference workers to stop experience collection... (15450 times) [2024-03-29 15:49:39,735][00497] InferenceWorker_p0-w0: stopping experience collection (15450 times) [2024-03-29 15:49:39,924][00476] Signal inference workers to resume experience collection... (15450 times) [2024-03-29 15:49:39,925][00497] InferenceWorker_p0-w0: resuming experience collection (15450 times) [2024-03-29 15:49:41,627][00497] Updated weights for policy 0, policy_version 33582 (0.0018) [2024-03-29 15:49:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 550289408. Throughput: 0: 42179.6. Samples: 432526220. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 15:49:43,840][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 15:49:45,548][00497] Updated weights for policy 0, policy_version 33592 (0.0023) [2024-03-29 15:49:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 550486016. Throughput: 0: 41972.4. Samples: 432644740. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 15:49:48,841][00126] Avg episode reward: [(0, '0.279')] [2024-03-29 15:49:49,606][00497] Updated weights for policy 0, policy_version 33602 (0.0025) [2024-03-29 15:49:53,411][00497] Updated weights for policy 0, policy_version 33612 (0.0023) [2024-03-29 15:49:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 550699008. Throughput: 0: 42147.6. Samples: 432898400. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 15:49:53,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:49:57,278][00497] Updated weights for policy 0, policy_version 33622 (0.0024) [2024-03-29 15:49:58,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 550912000. Throughput: 0: 42334.4. Samples: 433155020. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 15:49:58,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 15:50:01,330][00497] Updated weights for policy 0, policy_version 33632 (0.0028) [2024-03-29 15:50:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 551124992. Throughput: 0: 42280.1. Samples: 433278460. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 15:50:03,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:50:05,140][00497] Updated weights for policy 0, policy_version 33642 (0.0023) [2024-03-29 15:50:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 551337984. Throughput: 0: 42083.1. Samples: 433529380. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 15:50:08,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:50:08,911][00497] Updated weights for policy 0, policy_version 33652 (0.0019) [2024-03-29 15:50:12,238][00476] Signal inference workers to stop experience collection... (15500 times) [2024-03-29 15:50:12,239][00476] Signal inference workers to resume experience collection... (15500 times) [2024-03-29 15:50:12,283][00497] InferenceWorker_p0-w0: stopping experience collection (15500 times) [2024-03-29 15:50:12,283][00497] InferenceWorker_p0-w0: resuming experience collection (15500 times) [2024-03-29 15:50:12,800][00497] Updated weights for policy 0, policy_version 33662 (0.0018) [2024-03-29 15:50:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 551534592. Throughput: 0: 41843.5. Samples: 433769320. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 15:50:13,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 15:50:16,959][00497] Updated weights for policy 0, policy_version 33672 (0.0021) [2024-03-29 15:50:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 551763968. Throughput: 0: 42047.5. Samples: 433908220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:50:18,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 15:50:20,745][00497] Updated weights for policy 0, policy_version 33682 (0.0023) [2024-03-29 15:50:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 551976960. Throughput: 0: 41877.8. Samples: 434154940. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:50:23,840][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 15:50:24,444][00497] Updated weights for policy 0, policy_version 33692 (0.0022) [2024-03-29 15:50:28,318][00497] Updated weights for policy 0, policy_version 33702 (0.0019) [2024-03-29 15:50:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 552173568. Throughput: 0: 41695.0. Samples: 434402500. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:50:28,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 15:50:32,425][00497] Updated weights for policy 0, policy_version 33712 (0.0028) [2024-03-29 15:50:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 552402944. Throughput: 0: 42067.6. Samples: 434537780. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:50:33,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 15:50:36,218][00497] Updated weights for policy 0, policy_version 33722 (0.0032) [2024-03-29 15:50:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 552599552. Throughput: 0: 42155.4. Samples: 434795400. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:50:38,841][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 15:50:39,982][00497] Updated weights for policy 0, policy_version 33732 (0.0020) [2024-03-29 15:50:43,658][00497] Updated weights for policy 0, policy_version 33742 (0.0032) [2024-03-29 15:50:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 552828928. Throughput: 0: 42053.7. Samples: 435047440. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:50:43,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 15:50:44,959][00476] Signal inference workers to stop experience collection... (15550 times) [2024-03-29 15:50:45,002][00497] InferenceWorker_p0-w0: stopping experience collection (15550 times) [2024-03-29 15:50:45,040][00476] Signal inference workers to resume experience collection... (15550 times) [2024-03-29 15:50:45,046][00497] InferenceWorker_p0-w0: resuming experience collection (15550 times) [2024-03-29 15:50:48,001][00497] Updated weights for policy 0, policy_version 33752 (0.0023) [2024-03-29 15:50:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 553025536. Throughput: 0: 41876.9. Samples: 435162920. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:50:48,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 15:50:51,840][00497] Updated weights for policy 0, policy_version 33762 (0.0020) [2024-03-29 15:50:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 553238528. Throughput: 0: 42127.1. Samples: 435425100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:50:53,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:50:55,523][00497] Updated weights for policy 0, policy_version 33772 (0.0024) [2024-03-29 15:50:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.3, 300 sec: 41931.9). Total num frames: 553467904. Throughput: 0: 42543.6. Samples: 435683780. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:50:58,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:50:59,097][00497] Updated weights for policy 0, policy_version 33782 (0.0025) [2024-03-29 15:51:03,620][00497] Updated weights for policy 0, policy_version 33792 (0.0035) [2024-03-29 15:51:03,840][00126] Fps is (10 sec: 40958.7, 60 sec: 42052.1, 300 sec: 41820.8). Total num frames: 553648128. Throughput: 0: 42041.0. Samples: 435800080. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:51:03,842][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 15:51:04,172][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000033794_553680896.pth... [2024-03-29 15:51:04,510][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000033180_543621120.pth [2024-03-29 15:51:07,491][00497] Updated weights for policy 0, policy_version 33802 (0.0024) [2024-03-29 15:51:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 553877504. Throughput: 0: 42221.3. Samples: 436054900. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:51:08,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 15:51:11,412][00497] Updated weights for policy 0, policy_version 33812 (0.0027) [2024-03-29 15:51:13,839][00126] Fps is (10 sec: 40960.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 554057728. Throughput: 0: 42439.6. Samples: 436312280. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:51:13,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 15:51:14,951][00497] Updated weights for policy 0, policy_version 33822 (0.0025) [2024-03-29 15:51:16,776][00476] Signal inference workers to stop experience collection... (15600 times) [2024-03-29 15:51:16,802][00497] InferenceWorker_p0-w0: stopping experience collection (15600 times) [2024-03-29 15:51:16,963][00476] Signal inference workers to resume experience collection... (15600 times) [2024-03-29 15:51:16,964][00497] InferenceWorker_p0-w0: resuming experience collection (15600 times) [2024-03-29 15:51:18,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 554270720. Throughput: 0: 41985.3. Samples: 436427120. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:51:18,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 15:51:19,339][00497] Updated weights for policy 0, policy_version 33832 (0.0027) [2024-03-29 15:51:23,258][00497] Updated weights for policy 0, policy_version 33842 (0.0020) [2024-03-29 15:51:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 554500096. Throughput: 0: 41932.1. Samples: 436682340. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:51:23,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 15:51:26,966][00497] Updated weights for policy 0, policy_version 33852 (0.0023) [2024-03-29 15:51:28,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 554680320. Throughput: 0: 42136.3. Samples: 436943580. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:51:28,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 15:51:30,627][00497] Updated weights for policy 0, policy_version 33862 (0.0035) [2024-03-29 15:51:33,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 554893312. Throughput: 0: 41814.1. Samples: 437044560. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:51:33,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 15:51:34,964][00497] Updated weights for policy 0, policy_version 33872 (0.0017) [2024-03-29 15:51:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 555106304. Throughput: 0: 41786.1. Samples: 437305480. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:51:38,841][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 15:51:38,917][00497] Updated weights for policy 0, policy_version 33882 (0.0029) [2024-03-29 15:51:42,631][00497] Updated weights for policy 0, policy_version 33892 (0.0025) [2024-03-29 15:51:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 555302912. Throughput: 0: 41595.1. Samples: 437555560. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 15:51:43,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:51:46,467][00497] Updated weights for policy 0, policy_version 33902 (0.0020) [2024-03-29 15:51:47,634][00476] Signal inference workers to stop experience collection... (15650 times) [2024-03-29 15:51:47,703][00497] InferenceWorker_p0-w0: stopping experience collection (15650 times) [2024-03-29 15:51:47,705][00476] Signal inference workers to resume experience collection... (15650 times) [2024-03-29 15:51:47,728][00497] InferenceWorker_p0-w0: resuming experience collection (15650 times) [2024-03-29 15:51:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 555532288. Throughput: 0: 41546.9. Samples: 437669680. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:51:48,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 15:51:50,639][00497] Updated weights for policy 0, policy_version 33912 (0.0023) [2024-03-29 15:51:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 555728896. Throughput: 0: 41592.1. Samples: 437926540. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:51:53,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 15:51:54,508][00497] Updated weights for policy 0, policy_version 33922 (0.0023) [2024-03-29 15:51:58,503][00497] Updated weights for policy 0, policy_version 33932 (0.0018) [2024-03-29 15:51:58,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41233.0, 300 sec: 41820.9). Total num frames: 555941888. Throughput: 0: 41658.2. Samples: 438186900. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:51:58,841][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:52:02,103][00497] Updated weights for policy 0, policy_version 33942 (0.0035) [2024-03-29 15:52:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.4, 300 sec: 41820.9). Total num frames: 556154880. Throughput: 0: 41743.5. Samples: 438305580. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 15:52:03,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 15:52:06,344][00497] Updated weights for policy 0, policy_version 33952 (0.0031) [2024-03-29 15:52:08,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 556367872. Throughput: 0: 41662.3. Samples: 438557140. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 15:52:08,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 15:52:09,987][00497] Updated weights for policy 0, policy_version 33962 (0.0018) [2024-03-29 15:52:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 556580864. Throughput: 0: 41753.0. Samples: 438822460. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 15:52:13,840][00126] Avg episode reward: [(0, '0.496')] [2024-03-29 15:52:14,003][00497] Updated weights for policy 0, policy_version 33972 (0.0026) [2024-03-29 15:52:17,418][00497] Updated weights for policy 0, policy_version 33982 (0.0019) [2024-03-29 15:52:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 556777472. Throughput: 0: 42361.0. Samples: 438950800. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 15:52:18,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 15:52:21,316][00476] Signal inference workers to stop experience collection... (15700 times) [2024-03-29 15:52:21,354][00497] InferenceWorker_p0-w0: stopping experience collection (15700 times) [2024-03-29 15:52:21,537][00476] Signal inference workers to resume experience collection... (15700 times) [2024-03-29 15:52:21,538][00497] InferenceWorker_p0-w0: resuming experience collection (15700 times) [2024-03-29 15:52:21,793][00497] Updated weights for policy 0, policy_version 33992 (0.0024) [2024-03-29 15:52:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 557006848. Throughput: 0: 41756.1. Samples: 439184500. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 15:52:23,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 15:52:25,736][00497] Updated weights for policy 0, policy_version 34002 (0.0027) [2024-03-29 15:52:28,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 557203456. Throughput: 0: 41965.3. Samples: 439444000. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:52:28,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 15:52:29,616][00497] Updated weights for policy 0, policy_version 34012 (0.0022) [2024-03-29 15:52:33,238][00497] Updated weights for policy 0, policy_version 34022 (0.0024) [2024-03-29 15:52:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 557416448. Throughput: 0: 42482.2. Samples: 439581380. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:52:33,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:52:37,412][00497] Updated weights for policy 0, policy_version 34032 (0.0023) [2024-03-29 15:52:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 557645824. Throughput: 0: 41970.1. Samples: 439815200. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:52:38,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 15:52:41,482][00497] Updated weights for policy 0, policy_version 34042 (0.0028) [2024-03-29 15:52:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 557826048. Throughput: 0: 41717.5. Samples: 440064180. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:52:43,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 15:52:45,352][00497] Updated weights for policy 0, policy_version 34052 (0.0028) [2024-03-29 15:52:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 558055424. Throughput: 0: 41901.3. Samples: 440191140. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 15:52:48,842][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:52:48,947][00497] Updated weights for policy 0, policy_version 34062 (0.0021) [2024-03-29 15:52:53,140][00476] Signal inference workers to stop experience collection... (15750 times) [2024-03-29 15:52:53,183][00497] InferenceWorker_p0-w0: stopping experience collection (15750 times) [2024-03-29 15:52:53,221][00476] Signal inference workers to resume experience collection... (15750 times) [2024-03-29 15:52:53,223][00497] InferenceWorker_p0-w0: resuming experience collection (15750 times) [2024-03-29 15:52:53,230][00497] Updated weights for policy 0, policy_version 34072 (0.0027) [2024-03-29 15:52:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 558252032. Throughput: 0: 41813.3. Samples: 440438740. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:52:53,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 15:52:56,966][00497] Updated weights for policy 0, policy_version 34082 (0.0022) [2024-03-29 15:52:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 558465024. Throughput: 0: 41679.5. Samples: 440698040. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:52:58,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 15:53:01,114][00497] Updated weights for policy 0, policy_version 34092 (0.0026) [2024-03-29 15:53:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 558661632. Throughput: 0: 41527.2. Samples: 440819520. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:53:03,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 15:53:04,122][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000034100_558694400.pth... [2024-03-29 15:53:04,435][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000033485_548618240.pth [2024-03-29 15:53:05,006][00497] Updated weights for policy 0, policy_version 34102 (0.0020) [2024-03-29 15:53:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 558874624. Throughput: 0: 41759.1. Samples: 441063660. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 15:53:08,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 15:53:09,079][00497] Updated weights for policy 0, policy_version 34112 (0.0032) [2024-03-29 15:53:12,799][00497] Updated weights for policy 0, policy_version 34122 (0.0020) [2024-03-29 15:53:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 559071232. Throughput: 0: 41463.2. Samples: 441309840. Policy #0 lag: (min: 0.0, avg: 21.4, max: 44.0) [2024-03-29 15:53:13,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 15:53:16,924][00497] Updated weights for policy 0, policy_version 34132 (0.0026) [2024-03-29 15:53:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 559284224. Throughput: 0: 41462.7. Samples: 441447200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 44.0) [2024-03-29 15:53:18,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 15:53:20,448][00497] Updated weights for policy 0, policy_version 34142 (0.0024) [2024-03-29 15:53:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 559497216. Throughput: 0: 41738.4. Samples: 441693420. Policy #0 lag: (min: 0.0, avg: 21.4, max: 44.0) [2024-03-29 15:53:23,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 15:53:24,703][00497] Updated weights for policy 0, policy_version 34152 (0.0018) [2024-03-29 15:53:25,087][00476] Signal inference workers to stop experience collection... (15800 times) [2024-03-29 15:53:25,147][00497] InferenceWorker_p0-w0: stopping experience collection (15800 times) [2024-03-29 15:53:25,252][00476] Signal inference workers to resume experience collection... (15800 times) [2024-03-29 15:53:25,252][00497] InferenceWorker_p0-w0: resuming experience collection (15800 times) [2024-03-29 15:53:28,415][00497] Updated weights for policy 0, policy_version 34162 (0.0019) [2024-03-29 15:53:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 559710208. Throughput: 0: 41799.6. Samples: 441945160. Policy #0 lag: (min: 0.0, avg: 21.4, max: 44.0) [2024-03-29 15:53:28,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 15:53:32,375][00497] Updated weights for policy 0, policy_version 34172 (0.0020) [2024-03-29 15:53:33,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 559923200. Throughput: 0: 41989.7. Samples: 442080680. Policy #0 lag: (min: 0.0, avg: 21.4, max: 44.0) [2024-03-29 15:53:33,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:53:36,156][00497] Updated weights for policy 0, policy_version 34182 (0.0033) [2024-03-29 15:53:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 560136192. Throughput: 0: 41733.7. Samples: 442316760. Policy #0 lag: (min: 2.0, avg: 20.7, max: 41.0) [2024-03-29 15:53:38,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 15:53:40,143][00497] Updated weights for policy 0, policy_version 34192 (0.0018) [2024-03-29 15:53:43,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 560349184. Throughput: 0: 41739.2. Samples: 442576300. Policy #0 lag: (min: 2.0, avg: 20.7, max: 41.0) [2024-03-29 15:53:43,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 15:53:44,108][00497] Updated weights for policy 0, policy_version 34202 (0.0028) [2024-03-29 15:53:48,245][00497] Updated weights for policy 0, policy_version 34212 (0.0026) [2024-03-29 15:53:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 560529408. Throughput: 0: 41791.9. Samples: 442700160. Policy #0 lag: (min: 2.0, avg: 20.7, max: 41.0) [2024-03-29 15:53:48,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 15:53:51,897][00497] Updated weights for policy 0, policy_version 34222 (0.0023) [2024-03-29 15:53:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 560758784. Throughput: 0: 41643.5. Samples: 442937620. Policy #0 lag: (min: 2.0, avg: 20.7, max: 41.0) [2024-03-29 15:53:53,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 15:53:55,936][00497] Updated weights for policy 0, policy_version 34232 (0.0032) [2024-03-29 15:53:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 560971776. Throughput: 0: 41968.5. Samples: 443198420. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 15:53:58,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 15:53:59,711][00497] Updated weights for policy 0, policy_version 34242 (0.0018) [2024-03-29 15:54:03,053][00476] Signal inference workers to stop experience collection... (15850 times) [2024-03-29 15:54:03,097][00497] InferenceWorker_p0-w0: stopping experience collection (15850 times) [2024-03-29 15:54:03,277][00476] Signal inference workers to resume experience collection... (15850 times) [2024-03-29 15:54:03,277][00497] InferenceWorker_p0-w0: resuming experience collection (15850 times) [2024-03-29 15:54:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.1, 300 sec: 41987.5). Total num frames: 561184768. Throughput: 0: 41815.4. Samples: 443328900. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 15:54:03,840][00126] Avg episode reward: [(0, '0.476')] [2024-03-29 15:54:03,850][00497] Updated weights for policy 0, policy_version 34252 (0.0023) [2024-03-29 15:54:07,380][00497] Updated weights for policy 0, policy_version 34262 (0.0021) [2024-03-29 15:54:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 561381376. Throughput: 0: 41868.4. Samples: 443577500. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 15:54:08,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 15:54:11,609][00497] Updated weights for policy 0, policy_version 34272 (0.0018) [2024-03-29 15:54:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 561610752. Throughput: 0: 41964.4. Samples: 443833560. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 15:54:13,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:54:15,003][00497] Updated weights for policy 0, policy_version 34282 (0.0022) [2024-03-29 15:54:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 561807360. Throughput: 0: 41922.4. Samples: 443967180. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 15:54:18,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 15:54:19,286][00497] Updated weights for policy 0, policy_version 34292 (0.0025) [2024-03-29 15:54:22,744][00497] Updated weights for policy 0, policy_version 34302 (0.0023) [2024-03-29 15:54:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 562020352. Throughput: 0: 42239.2. Samples: 444217520. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 15:54:23,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 15:54:26,978][00497] Updated weights for policy 0, policy_version 34312 (0.0024) [2024-03-29 15:54:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 562249728. Throughput: 0: 42045.8. Samples: 444468360. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 15:54:28,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:54:30,527][00497] Updated weights for policy 0, policy_version 34322 (0.0023) [2024-03-29 15:54:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 562446336. Throughput: 0: 42244.4. Samples: 444601160. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 15:54:33,842][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 15:54:34,556][00497] Updated weights for policy 0, policy_version 34332 (0.0025) [2024-03-29 15:54:38,109][00497] Updated weights for policy 0, policy_version 34342 (0.0037) [2024-03-29 15:54:38,710][00476] Signal inference workers to stop experience collection... (15900 times) [2024-03-29 15:54:38,764][00497] InferenceWorker_p0-w0: stopping experience collection (15900 times) [2024-03-29 15:54:38,802][00476] Signal inference workers to resume experience collection... (15900 times) [2024-03-29 15:54:38,804][00497] InferenceWorker_p0-w0: resuming experience collection (15900 times) [2024-03-29 15:54:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 562675712. Throughput: 0: 42846.2. Samples: 444865700. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 15:54:38,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 15:54:42,149][00497] Updated weights for policy 0, policy_version 34352 (0.0023) [2024-03-29 15:54:43,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 562888704. Throughput: 0: 42528.4. Samples: 445112200. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 15:54:43,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 15:54:45,829][00497] Updated weights for policy 0, policy_version 34362 (0.0023) [2024-03-29 15:54:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 563085312. Throughput: 0: 42635.2. Samples: 445247480. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 15:54:48,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 15:54:49,865][00497] Updated weights for policy 0, policy_version 34372 (0.0024) [2024-03-29 15:54:53,484][00497] Updated weights for policy 0, policy_version 34382 (0.0026) [2024-03-29 15:54:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 563314688. Throughput: 0: 42745.7. Samples: 445501060. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 15:54:53,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 15:54:57,525][00497] Updated weights for policy 0, policy_version 34392 (0.0023) [2024-03-29 15:54:58,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 563527680. Throughput: 0: 42721.8. Samples: 445756040. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 15:54:58,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 15:55:01,148][00497] Updated weights for policy 0, policy_version 34402 (0.0024) [2024-03-29 15:55:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 563724288. Throughput: 0: 42436.0. Samples: 445876800. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 15:55:03,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 15:55:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000034407_563724288.pth... [2024-03-29 15:55:04,179][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000033794_553680896.pth [2024-03-29 15:55:05,528][00497] Updated weights for policy 0, policy_version 34412 (0.0023) [2024-03-29 15:55:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 563937280. Throughput: 0: 42435.6. Samples: 446127120. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:55:08,841][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:55:09,150][00497] Updated weights for policy 0, policy_version 34422 (0.0018) [2024-03-29 15:55:11,887][00476] Signal inference workers to stop experience collection... (15950 times) [2024-03-29 15:55:11,921][00497] InferenceWorker_p0-w0: stopping experience collection (15950 times) [2024-03-29 15:55:12,107][00476] Signal inference workers to resume experience collection... (15950 times) [2024-03-29 15:55:12,107][00497] InferenceWorker_p0-w0: resuming experience collection (15950 times) [2024-03-29 15:55:13,335][00497] Updated weights for policy 0, policy_version 34432 (0.0023) [2024-03-29 15:55:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 564150272. Throughput: 0: 42491.5. Samples: 446380480. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:55:13,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 15:55:16,846][00497] Updated weights for policy 0, policy_version 34442 (0.0036) [2024-03-29 15:55:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 564346880. Throughput: 0: 42290.3. Samples: 446504220. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:55:18,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 15:55:21,395][00497] Updated weights for policy 0, policy_version 34452 (0.0033) [2024-03-29 15:55:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 564543488. Throughput: 0: 41769.0. Samples: 446745300. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 15:55:23,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 15:55:25,023][00497] Updated weights for policy 0, policy_version 34462 (0.0027) [2024-03-29 15:55:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 564756480. Throughput: 0: 41774.7. Samples: 446992060. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:55:28,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 15:55:29,463][00497] Updated weights for policy 0, policy_version 34472 (0.0028) [2024-03-29 15:55:32,853][00497] Updated weights for policy 0, policy_version 34482 (0.0029) [2024-03-29 15:55:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 564969472. Throughput: 0: 41585.7. Samples: 447118840. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:55:33,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 15:55:37,312][00497] Updated weights for policy 0, policy_version 34492 (0.0018) [2024-03-29 15:55:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 565182464. Throughput: 0: 41693.4. Samples: 447377260. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:55:38,842][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 15:55:40,868][00497] Updated weights for policy 0, policy_version 34502 (0.0020) [2024-03-29 15:55:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 565379072. Throughput: 0: 41244.0. Samples: 447612020. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:55:43,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 15:55:44,866][00476] Signal inference workers to stop experience collection... (16000 times) [2024-03-29 15:55:44,930][00497] InferenceWorker_p0-w0: stopping experience collection (16000 times) [2024-03-29 15:55:44,945][00476] Signal inference workers to resume experience collection... (16000 times) [2024-03-29 15:55:44,964][00497] InferenceWorker_p0-w0: resuming experience collection (16000 times) [2024-03-29 15:55:44,966][00497] Updated weights for policy 0, policy_version 34512 (0.0027) [2024-03-29 15:55:48,498][00497] Updated weights for policy 0, policy_version 34522 (0.0020) [2024-03-29 15:55:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 565608448. Throughput: 0: 41551.9. Samples: 447746640. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 15:55:48,840][00126] Avg episode reward: [(0, '0.655')] [2024-03-29 15:55:48,840][00476] Saving new best policy, reward=0.655! [2024-03-29 15:55:52,885][00497] Updated weights for policy 0, policy_version 34532 (0.0029) [2024-03-29 15:55:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 565805056. Throughput: 0: 41924.8. Samples: 448013740. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 15:55:53,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 15:55:56,407][00497] Updated weights for policy 0, policy_version 34542 (0.0020) [2024-03-29 15:55:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 566034432. Throughput: 0: 41527.5. Samples: 448249220. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 15:55:58,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 15:56:00,416][00497] Updated weights for policy 0, policy_version 34552 (0.0033) [2024-03-29 15:56:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 566247424. Throughput: 0: 41734.2. Samples: 448382260. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 15:56:03,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 15:56:03,900][00497] Updated weights for policy 0, policy_version 34562 (0.0026) [2024-03-29 15:56:08,316][00497] Updated weights for policy 0, policy_version 34572 (0.0023) [2024-03-29 15:56:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 566444032. Throughput: 0: 42336.4. Samples: 448650440. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 15:56:08,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 15:56:11,912][00497] Updated weights for policy 0, policy_version 34582 (0.0020) [2024-03-29 15:56:13,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 566673408. Throughput: 0: 42022.5. Samples: 448883080. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 15:56:13,841][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 15:56:16,152][00497] Updated weights for policy 0, policy_version 34592 (0.0026) [2024-03-29 15:56:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 566870016. Throughput: 0: 42191.7. Samples: 449017460. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 15:56:18,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 15:56:19,517][00476] Signal inference workers to stop experience collection... (16050 times) [2024-03-29 15:56:19,563][00497] InferenceWorker_p0-w0: stopping experience collection (16050 times) [2024-03-29 15:56:19,598][00476] Signal inference workers to resume experience collection... (16050 times) [2024-03-29 15:56:19,601][00497] InferenceWorker_p0-w0: resuming experience collection (16050 times) [2024-03-29 15:56:19,604][00497] Updated weights for policy 0, policy_version 34602 (0.0025) [2024-03-29 15:56:23,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 567066624. Throughput: 0: 41951.1. Samples: 449265060. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 15:56:23,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 15:56:24,078][00497] Updated weights for policy 0, policy_version 34612 (0.0032) [2024-03-29 15:56:27,411][00497] Updated weights for policy 0, policy_version 34622 (0.0020) [2024-03-29 15:56:28,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 567296000. Throughput: 0: 42127.1. Samples: 449507740. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 15:56:28,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 15:56:31,831][00497] Updated weights for policy 0, policy_version 34632 (0.0031) [2024-03-29 15:56:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 567492608. Throughput: 0: 42068.1. Samples: 449639700. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 15:56:33,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 15:56:35,328][00497] Updated weights for policy 0, policy_version 34642 (0.0022) [2024-03-29 15:56:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 567689216. Throughput: 0: 41703.2. Samples: 449890380. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 15:56:38,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 15:56:39,611][00497] Updated weights for policy 0, policy_version 34652 (0.0023) [2024-03-29 15:56:43,098][00497] Updated weights for policy 0, policy_version 34662 (0.0024) [2024-03-29 15:56:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 567918592. Throughput: 0: 41918.3. Samples: 450135540. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 15:56:43,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 15:56:47,257][00497] Updated weights for policy 0, policy_version 34672 (0.0017) [2024-03-29 15:56:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 568131584. Throughput: 0: 41965.6. Samples: 450270720. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 15:56:48,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 15:56:50,840][00497] Updated weights for policy 0, policy_version 34682 (0.0019) [2024-03-29 15:56:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 568328192. Throughput: 0: 41772.0. Samples: 450530180. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 15:56:53,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:56:55,351][00497] Updated weights for policy 0, policy_version 34692 (0.0024) [2024-03-29 15:56:56,200][00476] Signal inference workers to stop experience collection... (16100 times) [2024-03-29 15:56:56,245][00497] InferenceWorker_p0-w0: stopping experience collection (16100 times) [2024-03-29 15:56:56,421][00476] Signal inference workers to resume experience collection... (16100 times) [2024-03-29 15:56:56,422][00497] InferenceWorker_p0-w0: resuming experience collection (16100 times) [2024-03-29 15:56:58,723][00497] Updated weights for policy 0, policy_version 34702 (0.0025) [2024-03-29 15:56:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 568557568. Throughput: 0: 41975.2. Samples: 450771960. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 15:56:58,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 15:57:02,789][00497] Updated weights for policy 0, policy_version 34712 (0.0019) [2024-03-29 15:57:03,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 568754176. Throughput: 0: 41806.5. Samples: 450898760. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 15:57:03,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 15:57:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000034715_568770560.pth... [2024-03-29 15:57:04,183][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000034100_558694400.pth [2024-03-29 15:57:06,379][00497] Updated weights for policy 0, policy_version 34722 (0.0027) [2024-03-29 15:57:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 568967168. Throughput: 0: 42020.4. Samples: 451155980. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 15:57:08,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 15:57:10,853][00497] Updated weights for policy 0, policy_version 34732 (0.0017) [2024-03-29 15:57:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 569196544. Throughput: 0: 42330.3. Samples: 451412600. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 15:57:13,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 15:57:14,068][00497] Updated weights for policy 0, policy_version 34742 (0.0030) [2024-03-29 15:57:18,286][00497] Updated weights for policy 0, policy_version 34752 (0.0027) [2024-03-29 15:57:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 569393152. Throughput: 0: 42046.3. Samples: 451531780. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 15:57:18,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 15:57:21,779][00497] Updated weights for policy 0, policy_version 34762 (0.0027) [2024-03-29 15:57:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 569606144. Throughput: 0: 42091.0. Samples: 451784480. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 15:57:23,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 15:57:26,286][00497] Updated weights for policy 0, policy_version 34772 (0.0027) [2024-03-29 15:57:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 569819136. Throughput: 0: 42477.7. Samples: 452047040. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 15:57:28,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 15:57:29,545][00497] Updated weights for policy 0, policy_version 34782 (0.0027) [2024-03-29 15:57:30,449][00476] Signal inference workers to stop experience collection... (16150 times) [2024-03-29 15:57:30,489][00497] InferenceWorker_p0-w0: stopping experience collection (16150 times) [2024-03-29 15:57:30,667][00476] Signal inference workers to resume experience collection... (16150 times) [2024-03-29 15:57:30,667][00497] InferenceWorker_p0-w0: resuming experience collection (16150 times) [2024-03-29 15:57:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 570015744. Throughput: 0: 41891.7. Samples: 452155840. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 15:57:33,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:57:33,937][00497] Updated weights for policy 0, policy_version 34792 (0.0034) [2024-03-29 15:57:37,568][00497] Updated weights for policy 0, policy_version 34802 (0.0024) [2024-03-29 15:57:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 570228736. Throughput: 0: 41956.4. Samples: 452418220. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 15:57:38,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 15:57:42,081][00497] Updated weights for policy 0, policy_version 34812 (0.0022) [2024-03-29 15:57:43,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 570441728. Throughput: 0: 42239.0. Samples: 452672720. Policy #0 lag: (min: 1.0, avg: 20.1, max: 44.0) [2024-03-29 15:57:43,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 15:57:45,507][00497] Updated weights for policy 0, policy_version 34822 (0.0027) [2024-03-29 15:57:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 570638336. Throughput: 0: 42158.3. Samples: 452795880. Policy #0 lag: (min: 1.0, avg: 20.1, max: 44.0) [2024-03-29 15:57:48,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 15:57:49,467][00497] Updated weights for policy 0, policy_version 34832 (0.0020) [2024-03-29 15:57:53,176][00497] Updated weights for policy 0, policy_version 34842 (0.0020) [2024-03-29 15:57:53,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 570867712. Throughput: 0: 42175.1. Samples: 453053860. Policy #0 lag: (min: 1.0, avg: 20.1, max: 44.0) [2024-03-29 15:57:53,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 15:57:57,398][00497] Updated weights for policy 0, policy_version 34852 (0.0022) [2024-03-29 15:57:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 571080704. Throughput: 0: 42405.2. Samples: 453320840. Policy #0 lag: (min: 1.0, avg: 20.1, max: 44.0) [2024-03-29 15:57:58,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 15:58:01,107][00497] Updated weights for policy 0, policy_version 34862 (0.0021) [2024-03-29 15:58:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 571277312. Throughput: 0: 42355.6. Samples: 453437780. Policy #0 lag: (min: 1.0, avg: 20.1, max: 44.0) [2024-03-29 15:58:03,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 15:58:04,854][00497] Updated weights for policy 0, policy_version 34872 (0.0025) [2024-03-29 15:58:04,879][00476] Signal inference workers to stop experience collection... (16200 times) [2024-03-29 15:58:04,918][00497] InferenceWorker_p0-w0: stopping experience collection (16200 times) [2024-03-29 15:58:05,107][00476] Signal inference workers to resume experience collection... (16200 times) [2024-03-29 15:58:05,107][00497] InferenceWorker_p0-w0: resuming experience collection (16200 times) [2024-03-29 15:58:08,687][00497] Updated weights for policy 0, policy_version 34882 (0.0022) [2024-03-29 15:58:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 571506688. Throughput: 0: 42220.6. Samples: 453684400. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 15:58:08,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 15:58:13,022][00497] Updated weights for policy 0, policy_version 34892 (0.0021) [2024-03-29 15:58:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 571703296. Throughput: 0: 42243.6. Samples: 453948000. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 15:58:13,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 15:58:16,589][00497] Updated weights for policy 0, policy_version 34902 (0.0020) [2024-03-29 15:58:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 571932672. Throughput: 0: 42395.1. Samples: 454063620. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 15:58:18,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 15:58:20,484][00497] Updated weights for policy 0, policy_version 34912 (0.0019) [2024-03-29 15:58:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.4, 300 sec: 42098.5). Total num frames: 572129280. Throughput: 0: 42284.5. Samples: 454321020. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 15:58:23,843][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:58:24,142][00497] Updated weights for policy 0, policy_version 34922 (0.0023) [2024-03-29 15:58:28,388][00497] Updated weights for policy 0, policy_version 34932 (0.0021) [2024-03-29 15:58:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 572342272. Throughput: 0: 42619.7. Samples: 454590600. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 15:58:28,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 15:58:31,800][00497] Updated weights for policy 0, policy_version 34942 (0.0023) [2024-03-29 15:58:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 572571648. Throughput: 0: 42381.3. Samples: 454703040. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 15:58:33,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:58:35,926][00497] Updated weights for policy 0, policy_version 34952 (0.0021) [2024-03-29 15:58:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 572768256. Throughput: 0: 42099.1. Samples: 454948320. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 15:58:38,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 15:58:39,877][00497] Updated weights for policy 0, policy_version 34962 (0.0018) [2024-03-29 15:58:43,190][00476] Signal inference workers to stop experience collection... (16250 times) [2024-03-29 15:58:43,234][00497] InferenceWorker_p0-w0: stopping experience collection (16250 times) [2024-03-29 15:58:43,403][00476] Signal inference workers to resume experience collection... (16250 times) [2024-03-29 15:58:43,403][00497] InferenceWorker_p0-w0: resuming experience collection (16250 times) [2024-03-29 15:58:43,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.3, 300 sec: 42098.6). Total num frames: 572948480. Throughput: 0: 42119.7. Samples: 455216220. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 15:58:43,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 15:58:44,217][00497] Updated weights for policy 0, policy_version 34972 (0.0022) [2024-03-29 15:58:47,678][00497] Updated weights for policy 0, policy_version 34982 (0.0024) [2024-03-29 15:58:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 573194240. Throughput: 0: 41968.5. Samples: 455326360. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 15:58:48,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 15:58:51,479][00497] Updated weights for policy 0, policy_version 34992 (0.0023) [2024-03-29 15:58:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 573390848. Throughput: 0: 42206.7. Samples: 455583700. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 15:58:53,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 15:58:55,312][00497] Updated weights for policy 0, policy_version 35002 (0.0017) [2024-03-29 15:58:58,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 573587456. Throughput: 0: 42337.8. Samples: 455853200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 15:58:58,841][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 15:58:59,603][00497] Updated weights for policy 0, policy_version 35012 (0.0027) [2024-03-29 15:59:02,858][00497] Updated weights for policy 0, policy_version 35022 (0.0019) [2024-03-29 15:59:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 573833216. Throughput: 0: 42412.4. Samples: 455972180. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 15:59:03,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 15:59:04,127][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035026_573865984.pth... [2024-03-29 15:59:04,433][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000034407_563724288.pth [2024-03-29 15:59:06,732][00497] Updated weights for policy 0, policy_version 35032 (0.0017) [2024-03-29 15:59:08,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 574029824. Throughput: 0: 42214.1. Samples: 456220660. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 15:59:08,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 15:59:10,690][00497] Updated weights for policy 0, policy_version 35042 (0.0022) [2024-03-29 15:59:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 574226432. Throughput: 0: 42024.5. Samples: 456481700. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:59:13,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 15:59:15,064][00497] Updated weights for policy 0, policy_version 35052 (0.0028) [2024-03-29 15:59:17,437][00476] Signal inference workers to stop experience collection... (16300 times) [2024-03-29 15:59:17,470][00497] InferenceWorker_p0-w0: stopping experience collection (16300 times) [2024-03-29 15:59:17,652][00476] Signal inference workers to resume experience collection... (16300 times) [2024-03-29 15:59:17,652][00497] InferenceWorker_p0-w0: resuming experience collection (16300 times) [2024-03-29 15:59:18,264][00497] Updated weights for policy 0, policy_version 35062 (0.0024) [2024-03-29 15:59:18,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 574455808. Throughput: 0: 42354.7. Samples: 456609000. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:59:18,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 15:59:22,382][00497] Updated weights for policy 0, policy_version 35072 (0.0025) [2024-03-29 15:59:23,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 574668800. Throughput: 0: 42362.2. Samples: 456854620. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:59:23,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 15:59:26,102][00497] Updated weights for policy 0, policy_version 35082 (0.0023) [2024-03-29 15:59:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 574865408. Throughput: 0: 42153.3. Samples: 457113120. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:59:28,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 15:59:30,574][00497] Updated weights for policy 0, policy_version 35092 (0.0018) [2024-03-29 15:59:33,773][00497] Updated weights for policy 0, policy_version 35102 (0.0026) [2024-03-29 15:59:33,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 575111168. Throughput: 0: 42731.0. Samples: 457249260. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 15:59:33,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 15:59:37,490][00497] Updated weights for policy 0, policy_version 35112 (0.0025) [2024-03-29 15:59:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 575324160. Throughput: 0: 42624.8. Samples: 457501820. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 15:59:38,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 15:59:41,347][00497] Updated weights for policy 0, policy_version 35122 (0.0027) [2024-03-29 15:59:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 575520768. Throughput: 0: 42530.5. Samples: 457767080. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 15:59:43,841][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:59:45,885][00497] Updated weights for policy 0, policy_version 35132 (0.0027) [2024-03-29 15:59:48,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 575750144. Throughput: 0: 42900.4. Samples: 457902700. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 15:59:48,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 15:59:49,179][00497] Updated weights for policy 0, policy_version 35142 (0.0025) [2024-03-29 15:59:49,461][00476] Signal inference workers to stop experience collection... (16350 times) [2024-03-29 15:59:49,479][00497] InferenceWorker_p0-w0: stopping experience collection (16350 times) [2024-03-29 15:59:49,680][00476] Signal inference workers to resume experience collection... (16350 times) [2024-03-29 15:59:49,681][00497] InferenceWorker_p0-w0: resuming experience collection (16350 times) [2024-03-29 15:59:53,055][00497] Updated weights for policy 0, policy_version 35152 (0.0017) [2024-03-29 15:59:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 575946752. Throughput: 0: 42509.3. Samples: 458133580. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 15:59:53,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 15:59:57,042][00497] Updated weights for policy 0, policy_version 35162 (0.0030) [2024-03-29 15:59:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 576159744. Throughput: 0: 42537.7. Samples: 458395900. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 15:59:58,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:00:01,379][00497] Updated weights for policy 0, policy_version 35172 (0.0017) [2024-03-29 16:00:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 576372736. Throughput: 0: 42890.1. Samples: 458539060. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 16:00:03,840][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 16:00:04,570][00497] Updated weights for policy 0, policy_version 35182 (0.0029) [2024-03-29 16:00:08,563][00497] Updated weights for policy 0, policy_version 35192 (0.0020) [2024-03-29 16:00:08,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 576585728. Throughput: 0: 42643.7. Samples: 458773580. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 16:00:08,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 16:00:12,371][00497] Updated weights for policy 0, policy_version 35202 (0.0021) [2024-03-29 16:00:13,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 576782336. Throughput: 0: 42521.7. Samples: 459026600. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 16:00:13,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 16:00:16,791][00497] Updated weights for policy 0, policy_version 35212 (0.0024) [2024-03-29 16:00:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 577011712. Throughput: 0: 42607.6. Samples: 459166600. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 16:00:18,841][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:00:19,933][00497] Updated weights for policy 0, policy_version 35222 (0.0021) [2024-03-29 16:00:23,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42265.1). Total num frames: 577224704. Throughput: 0: 42177.3. Samples: 459399800. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 16:00:23,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 16:00:24,191][00497] Updated weights for policy 0, policy_version 35232 (0.0019) [2024-03-29 16:00:28,092][00497] Updated weights for policy 0, policy_version 35242 (0.0016) [2024-03-29 16:00:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 577421312. Throughput: 0: 41948.6. Samples: 459654760. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 16:00:28,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 16:00:29,718][00476] Signal inference workers to stop experience collection... (16400 times) [2024-03-29 16:00:29,784][00497] InferenceWorker_p0-w0: stopping experience collection (16400 times) [2024-03-29 16:00:29,793][00476] Signal inference workers to resume experience collection... (16400 times) [2024-03-29 16:00:29,809][00497] InferenceWorker_p0-w0: resuming experience collection (16400 times) [2024-03-29 16:00:32,385][00497] Updated weights for policy 0, policy_version 35252 (0.0023) [2024-03-29 16:00:33,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 577634304. Throughput: 0: 42077.8. Samples: 459796200. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 16:00:33,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:00:35,585][00497] Updated weights for policy 0, policy_version 35262 (0.0030) [2024-03-29 16:00:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 577863680. Throughput: 0: 42137.8. Samples: 460029780. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 16:00:38,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 16:00:39,395][00497] Updated weights for policy 0, policy_version 35272 (0.0017) [2024-03-29 16:00:43,684][00497] Updated weights for policy 0, policy_version 35282 (0.0019) [2024-03-29 16:00:43,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 578060288. Throughput: 0: 41996.0. Samples: 460285720. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 16:00:43,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 16:00:47,995][00497] Updated weights for policy 0, policy_version 35292 (0.0023) [2024-03-29 16:00:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 578273280. Throughput: 0: 41781.9. Samples: 460419240. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:00:48,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 16:00:51,247][00497] Updated weights for policy 0, policy_version 35302 (0.0022) [2024-03-29 16:00:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 578469888. Throughput: 0: 41954.1. Samples: 460661520. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:00:53,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:00:55,000][00497] Updated weights for policy 0, policy_version 35312 (0.0028) [2024-03-29 16:00:58,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 578682880. Throughput: 0: 42046.7. Samples: 460918700. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:00:58,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:00:59,254][00497] Updated weights for policy 0, policy_version 35322 (0.0023) [2024-03-29 16:01:03,374][00476] Signal inference workers to stop experience collection... (16450 times) [2024-03-29 16:01:03,451][00497] InferenceWorker_p0-w0: stopping experience collection (16450 times) [2024-03-29 16:01:03,464][00476] Signal inference workers to resume experience collection... (16450 times) [2024-03-29 16:01:03,481][00497] InferenceWorker_p0-w0: resuming experience collection (16450 times) [2024-03-29 16:01:03,731][00497] Updated weights for policy 0, policy_version 35332 (0.0021) [2024-03-29 16:01:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 578879488. Throughput: 0: 41814.2. Samples: 461048240. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:01:03,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 16:01:04,030][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035333_578895872.pth... [2024-03-29 16:01:04,330][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000034715_568770560.pth [2024-03-29 16:01:07,033][00497] Updated weights for policy 0, policy_version 35342 (0.0029) [2024-03-29 16:01:08,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 579125248. Throughput: 0: 42120.1. Samples: 461295200. Policy #0 lag: (min: 1.0, avg: 23.0, max: 41.0) [2024-03-29 16:01:08,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 16:01:10,698][00497] Updated weights for policy 0, policy_version 35352 (0.0020) [2024-03-29 16:01:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 579305472. Throughput: 0: 42043.9. Samples: 461546740. Policy #0 lag: (min: 1.0, avg: 23.0, max: 41.0) [2024-03-29 16:01:13,842][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 16:01:14,957][00497] Updated weights for policy 0, policy_version 35362 (0.0021) [2024-03-29 16:01:18,839][00126] Fps is (10 sec: 37683.0, 60 sec: 41506.2, 300 sec: 42154.1). Total num frames: 579502080. Throughput: 0: 41638.7. Samples: 461669940. Policy #0 lag: (min: 1.0, avg: 23.0, max: 41.0) [2024-03-29 16:01:18,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:01:19,394][00497] Updated weights for policy 0, policy_version 35372 (0.0031) [2024-03-29 16:01:22,697][00497] Updated weights for policy 0, policy_version 35382 (0.0021) [2024-03-29 16:01:23,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 579747840. Throughput: 0: 41899.1. Samples: 461915240. Policy #0 lag: (min: 1.0, avg: 23.0, max: 41.0) [2024-03-29 16:01:23,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 16:01:26,531][00497] Updated weights for policy 0, policy_version 35392 (0.0022) [2024-03-29 16:01:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 579928064. Throughput: 0: 41877.5. Samples: 462170200. Policy #0 lag: (min: 1.0, avg: 23.0, max: 41.0) [2024-03-29 16:01:28,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 16:01:30,688][00497] Updated weights for policy 0, policy_version 35402 (0.0020) [2024-03-29 16:01:33,530][00476] Signal inference workers to stop experience collection... (16500 times) [2024-03-29 16:01:33,593][00497] InferenceWorker_p0-w0: stopping experience collection (16500 times) [2024-03-29 16:01:33,603][00476] Signal inference workers to resume experience collection... (16500 times) [2024-03-29 16:01:33,622][00497] InferenceWorker_p0-w0: resuming experience collection (16500 times) [2024-03-29 16:01:33,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 580124672. Throughput: 0: 41682.6. Samples: 462294960. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:01:33,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 16:01:35,036][00497] Updated weights for policy 0, policy_version 35412 (0.0020) [2024-03-29 16:01:38,199][00497] Updated weights for policy 0, policy_version 35422 (0.0020) [2024-03-29 16:01:38,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 580386816. Throughput: 0: 42014.2. Samples: 462552160. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:01:38,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 16:01:41,923][00497] Updated weights for policy 0, policy_version 35432 (0.0023) [2024-03-29 16:01:43,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 580567040. Throughput: 0: 41719.5. Samples: 462796080. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:01:43,842][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 16:01:46,466][00497] Updated weights for policy 0, policy_version 35442 (0.0025) [2024-03-29 16:01:48,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.0, 300 sec: 42154.1). Total num frames: 580763648. Throughput: 0: 41558.7. Samples: 462918380. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:01:48,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 16:01:50,629][00497] Updated weights for policy 0, policy_version 35452 (0.0018) [2024-03-29 16:01:53,745][00497] Updated weights for policy 0, policy_version 35462 (0.0019) [2024-03-29 16:01:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 581009408. Throughput: 0: 41967.4. Samples: 463183740. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:01:53,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 16:01:57,568][00497] Updated weights for policy 0, policy_version 35472 (0.0023) [2024-03-29 16:01:58,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 581189632. Throughput: 0: 41798.8. Samples: 463427680. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 16:01:58,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:02:01,892][00497] Updated weights for policy 0, policy_version 35482 (0.0026) [2024-03-29 16:02:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 581402624. Throughput: 0: 42098.1. Samples: 463564360. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 16:02:03,841][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 16:02:06,033][00497] Updated weights for policy 0, policy_version 35492 (0.0017) [2024-03-29 16:02:07,814][00476] Signal inference workers to stop experience collection... (16550 times) [2024-03-29 16:02:07,852][00497] InferenceWorker_p0-w0: stopping experience collection (16550 times) [2024-03-29 16:02:08,031][00476] Signal inference workers to resume experience collection... (16550 times) [2024-03-29 16:02:08,032][00497] InferenceWorker_p0-w0: resuming experience collection (16550 times) [2024-03-29 16:02:08,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 581632000. Throughput: 0: 42419.1. Samples: 463824100. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 16:02:08,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 16:02:09,318][00497] Updated weights for policy 0, policy_version 35502 (0.0018) [2024-03-29 16:02:13,245][00497] Updated weights for policy 0, policy_version 35512 (0.0026) [2024-03-29 16:02:13,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 581828608. Throughput: 0: 41644.4. Samples: 464044200. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 16:02:13,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 16:02:17,572][00497] Updated weights for policy 0, policy_version 35522 (0.0025) [2024-03-29 16:02:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 582025216. Throughput: 0: 41897.7. Samples: 464180360. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:02:18,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 16:02:21,869][00497] Updated weights for policy 0, policy_version 35532 (0.0033) [2024-03-29 16:02:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 582238208. Throughput: 0: 42007.3. Samples: 464442480. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:02:23,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 16:02:25,065][00497] Updated weights for policy 0, policy_version 35542 (0.0030) [2024-03-29 16:02:28,803][00497] Updated weights for policy 0, policy_version 35552 (0.0022) [2024-03-29 16:02:28,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 582483968. Throughput: 0: 42122.8. Samples: 464691600. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:02:28,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:02:33,190][00497] Updated weights for policy 0, policy_version 35562 (0.0022) [2024-03-29 16:02:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 582664192. Throughput: 0: 42036.1. Samples: 464810000. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:02:33,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:02:37,671][00497] Updated weights for policy 0, policy_version 35572 (0.0022) [2024-03-29 16:02:38,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.2, 300 sec: 42098.6). Total num frames: 582860800. Throughput: 0: 41965.5. Samples: 465072180. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:02:38,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 16:02:39,832][00476] Signal inference workers to stop experience collection... (16600 times) [2024-03-29 16:02:39,872][00497] InferenceWorker_p0-w0: stopping experience collection (16600 times) [2024-03-29 16:02:40,060][00476] Signal inference workers to resume experience collection... (16600 times) [2024-03-29 16:02:40,061][00497] InferenceWorker_p0-w0: resuming experience collection (16600 times) [2024-03-29 16:02:40,669][00497] Updated weights for policy 0, policy_version 35582 (0.0030) [2024-03-29 16:02:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 583090176. Throughput: 0: 41927.1. Samples: 465314400. Policy #0 lag: (min: 2.0, avg: 22.0, max: 43.0) [2024-03-29 16:02:43,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:02:44,447][00497] Updated weights for policy 0, policy_version 35592 (0.0024) [2024-03-29 16:02:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 583286784. Throughput: 0: 41566.4. Samples: 465434840. Policy #0 lag: (min: 2.0, avg: 22.0, max: 43.0) [2024-03-29 16:02:48,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:02:49,533][00497] Updated weights for policy 0, policy_version 35603 (0.0019) [2024-03-29 16:02:53,725][00497] Updated weights for policy 0, policy_version 35613 (0.0016) [2024-03-29 16:02:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.2, 300 sec: 42043.0). Total num frames: 583483392. Throughput: 0: 41559.2. Samples: 465694260. Policy #0 lag: (min: 2.0, avg: 22.0, max: 43.0) [2024-03-29 16:02:53,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 16:02:56,920][00497] Updated weights for policy 0, policy_version 35623 (0.0020) [2024-03-29 16:02:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 583712768. Throughput: 0: 41844.9. Samples: 465927220. Policy #0 lag: (min: 2.0, avg: 22.0, max: 43.0) [2024-03-29 16:02:58,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 16:03:01,186][00497] Updated weights for policy 0, policy_version 35633 (0.0024) [2024-03-29 16:03:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 583909376. Throughput: 0: 41789.0. Samples: 466060860. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:03,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 16:03:03,864][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035640_583925760.pth... [2024-03-29 16:03:04,194][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035026_573865984.pth [2024-03-29 16:03:05,018][00497] Updated weights for policy 0, policy_version 35643 (0.0025) [2024-03-29 16:03:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 42043.0). Total num frames: 584105984. Throughput: 0: 41682.1. Samples: 466318180. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:08,842][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 16:03:09,490][00497] Updated weights for policy 0, policy_version 35653 (0.0024) [2024-03-29 16:03:12,495][00476] Signal inference workers to stop experience collection... (16650 times) [2024-03-29 16:03:12,524][00497] InferenceWorker_p0-w0: stopping experience collection (16650 times) [2024-03-29 16:03:12,671][00476] Signal inference workers to resume experience collection... (16650 times) [2024-03-29 16:03:12,672][00497] InferenceWorker_p0-w0: resuming experience collection (16650 times) [2024-03-29 16:03:12,679][00497] Updated weights for policy 0, policy_version 35663 (0.0033) [2024-03-29 16:03:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 584351744. Throughput: 0: 41275.5. Samples: 466549000. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:13,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 16:03:16,903][00497] Updated weights for policy 0, policy_version 35673 (0.0018) [2024-03-29 16:03:18,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 584548352. Throughput: 0: 41582.6. Samples: 466681220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:18,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:03:20,780][00497] Updated weights for policy 0, policy_version 35683 (0.0021) [2024-03-29 16:03:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 584744960. Throughput: 0: 41680.3. Samples: 466947800. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:23,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 16:03:25,115][00497] Updated weights for policy 0, policy_version 35693 (0.0034) [2024-03-29 16:03:28,152][00497] Updated weights for policy 0, policy_version 35703 (0.0019) [2024-03-29 16:03:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 584990720. Throughput: 0: 41693.3. Samples: 467190600. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:28,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 16:03:32,379][00497] Updated weights for policy 0, policy_version 35713 (0.0022) [2024-03-29 16:03:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 585170944. Throughput: 0: 41822.1. Samples: 467316840. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:33,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 16:03:36,300][00497] Updated weights for policy 0, policy_version 35723 (0.0025) [2024-03-29 16:03:38,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 585367552. Throughput: 0: 41577.4. Samples: 467565240. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:38,841][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 16:03:40,765][00497] Updated weights for policy 0, policy_version 35733 (0.0022) [2024-03-29 16:03:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 585596928. Throughput: 0: 42036.4. Samples: 467818860. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:43,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:03:44,014][00497] Updated weights for policy 0, policy_version 35743 (0.0035) [2024-03-29 16:03:44,939][00476] Signal inference workers to stop experience collection... (16700 times) [2024-03-29 16:03:44,995][00497] InferenceWorker_p0-w0: stopping experience collection (16700 times) [2024-03-29 16:03:45,104][00476] Signal inference workers to resume experience collection... (16700 times) [2024-03-29 16:03:45,104][00497] InferenceWorker_p0-w0: resuming experience collection (16700 times) [2024-03-29 16:03:48,220][00497] Updated weights for policy 0, policy_version 35753 (0.0022) [2024-03-29 16:03:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 585809920. Throughput: 0: 41727.1. Samples: 467938580. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 16:03:48,840][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 16:03:52,403][00497] Updated weights for policy 0, policy_version 35763 (0.0024) [2024-03-29 16:03:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 585990144. Throughput: 0: 41526.7. Samples: 468186880. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 16:03:53,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:03:56,673][00497] Updated weights for policy 0, policy_version 35773 (0.0023) [2024-03-29 16:03:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 586219520. Throughput: 0: 42252.9. Samples: 468450380. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 16:03:58,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 16:03:59,676][00497] Updated weights for policy 0, policy_version 35783 (0.0022) [2024-03-29 16:04:03,676][00497] Updated weights for policy 0, policy_version 35793 (0.0020) [2024-03-29 16:04:03,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 586432512. Throughput: 0: 41683.9. Samples: 468557000. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 16:04:03,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 16:04:07,793][00497] Updated weights for policy 0, policy_version 35803 (0.0019) [2024-03-29 16:04:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 586629120. Throughput: 0: 41765.8. Samples: 468827260. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 16:04:08,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 16:04:12,176][00497] Updated weights for policy 0, policy_version 35813 (0.0017) [2024-03-29 16:04:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 586825728. Throughput: 0: 42087.0. Samples: 469084520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 16:04:13,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:04:15,259][00497] Updated weights for policy 0, policy_version 35823 (0.0023) [2024-03-29 16:04:18,544][00476] Signal inference workers to stop experience collection... (16750 times) [2024-03-29 16:04:18,546][00476] Signal inference workers to resume experience collection... (16750 times) [2024-03-29 16:04:18,591][00497] InferenceWorker_p0-w0: stopping experience collection (16750 times) [2024-03-29 16:04:18,591][00497] InferenceWorker_p0-w0: resuming experience collection (16750 times) [2024-03-29 16:04:18,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 587071488. Throughput: 0: 41626.8. Samples: 469190040. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 16:04:18,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 16:04:19,302][00497] Updated weights for policy 0, policy_version 35833 (0.0019) [2024-03-29 16:04:23,253][00497] Updated weights for policy 0, policy_version 35843 (0.0018) [2024-03-29 16:04:23,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 587284480. Throughput: 0: 42205.2. Samples: 469464480. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 16:04:23,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 16:04:27,328][00497] Updated weights for policy 0, policy_version 35853 (0.0026) [2024-03-29 16:04:28,839][00126] Fps is (10 sec: 40959.2, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 587481088. Throughput: 0: 42368.4. Samples: 469725440. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 16:04:28,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 16:04:30,634][00497] Updated weights for policy 0, policy_version 35863 (0.0024) [2024-03-29 16:04:33,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 587710464. Throughput: 0: 42220.9. Samples: 469838520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 16:04:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:04:34,663][00497] Updated weights for policy 0, policy_version 35873 (0.0020) [2024-03-29 16:04:38,591][00497] Updated weights for policy 0, policy_version 35883 (0.0022) [2024-03-29 16:04:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 587907072. Throughput: 0: 42382.2. Samples: 470094080. Policy #0 lag: (min: 1.0, avg: 20.8, max: 40.0) [2024-03-29 16:04:38,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:04:42,824][00497] Updated weights for policy 0, policy_version 35893 (0.0032) [2024-03-29 16:04:43,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 588103680. Throughput: 0: 42389.2. Samples: 470357900. Policy #0 lag: (min: 1.0, avg: 20.8, max: 40.0) [2024-03-29 16:04:43,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 16:04:46,149][00497] Updated weights for policy 0, policy_version 35903 (0.0023) [2024-03-29 16:04:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 588333056. Throughput: 0: 42413.4. Samples: 470465600. Policy #0 lag: (min: 1.0, avg: 20.8, max: 40.0) [2024-03-29 16:04:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 16:04:50,100][00497] Updated weights for policy 0, policy_version 35913 (0.0028) [2024-03-29 16:04:53,471][00476] Signal inference workers to stop experience collection... (16800 times) [2024-03-29 16:04:53,530][00497] InferenceWorker_p0-w0: stopping experience collection (16800 times) [2024-03-29 16:04:53,560][00476] Signal inference workers to resume experience collection... (16800 times) [2024-03-29 16:04:53,564][00497] InferenceWorker_p0-w0: resuming experience collection (16800 times) [2024-03-29 16:04:53,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42325.4, 300 sec: 41932.0). Total num frames: 588529664. Throughput: 0: 42036.6. Samples: 470718900. Policy #0 lag: (min: 1.0, avg: 20.8, max: 40.0) [2024-03-29 16:04:53,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 16:04:54,368][00497] Updated weights for policy 0, policy_version 35923 (0.0019) [2024-03-29 16:04:58,836][00497] Updated weights for policy 0, policy_version 35933 (0.0026) [2024-03-29 16:04:58,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 588726272. Throughput: 0: 42065.3. Samples: 470977460. Policy #0 lag: (min: 1.0, avg: 20.8, max: 40.0) [2024-03-29 16:04:58,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 16:05:01,906][00497] Updated weights for policy 0, policy_version 35943 (0.0025) [2024-03-29 16:05:03,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 588955648. Throughput: 0: 42400.3. Samples: 471098060. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 16:05:03,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 16:05:03,941][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035948_588972032.pth... [2024-03-29 16:05:04,280][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035333_578895872.pth [2024-03-29 16:05:05,796][00497] Updated weights for policy 0, policy_version 35953 (0.0019) [2024-03-29 16:05:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 589152256. Throughput: 0: 41738.4. Samples: 471342700. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 16:05:08,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 16:05:10,247][00497] Updated weights for policy 0, policy_version 35963 (0.0020) [2024-03-29 16:05:13,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 589348864. Throughput: 0: 41701.9. Samples: 471602020. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 16:05:13,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 16:05:14,538][00497] Updated weights for policy 0, policy_version 35973 (0.0036) [2024-03-29 16:05:17,707][00497] Updated weights for policy 0, policy_version 35983 (0.0025) [2024-03-29 16:05:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 589578240. Throughput: 0: 41964.0. Samples: 471726900. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 16:05:18,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:05:21,517][00497] Updated weights for policy 0, policy_version 35993 (0.0027) [2024-03-29 16:05:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 589791232. Throughput: 0: 41675.6. Samples: 471969480. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 16:05:23,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:05:25,739][00497] Updated weights for policy 0, policy_version 36003 (0.0017) [2024-03-29 16:05:28,441][00476] Signal inference workers to stop experience collection... (16850 times) [2024-03-29 16:05:28,520][00497] InferenceWorker_p0-w0: stopping experience collection (16850 times) [2024-03-29 16:05:28,531][00476] Signal inference workers to resume experience collection... (16850 times) [2024-03-29 16:05:28,554][00497] InferenceWorker_p0-w0: resuming experience collection (16850 times) [2024-03-29 16:05:28,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 590004224. Throughput: 0: 41953.1. Samples: 472245780. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 16:05:28,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:05:29,692][00497] Updated weights for policy 0, policy_version 36013 (0.0020) [2024-03-29 16:05:33,027][00497] Updated weights for policy 0, policy_version 36023 (0.0021) [2024-03-29 16:05:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 590233600. Throughput: 0: 42394.2. Samples: 472373340. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 16:05:33,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 16:05:36,998][00497] Updated weights for policy 0, policy_version 36033 (0.0027) [2024-03-29 16:05:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 590430208. Throughput: 0: 42091.5. Samples: 472613020. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 16:05:38,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:05:41,153][00497] Updated weights for policy 0, policy_version 36043 (0.0018) [2024-03-29 16:05:43,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 590610432. Throughput: 0: 42172.1. Samples: 472875200. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 16:05:43,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 16:05:45,146][00497] Updated weights for policy 0, policy_version 36053 (0.0022) [2024-03-29 16:05:48,336][00497] Updated weights for policy 0, policy_version 36063 (0.0028) [2024-03-29 16:05:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 590872576. Throughput: 0: 42530.3. Samples: 473011920. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 16:05:48,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 16:05:52,289][00497] Updated weights for policy 0, policy_version 36073 (0.0017) [2024-03-29 16:05:53,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 591069184. Throughput: 0: 42453.8. Samples: 473253120. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 16:05:53,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 16:05:56,428][00497] Updated weights for policy 0, policy_version 36083 (0.0020) [2024-03-29 16:05:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.5, 300 sec: 41987.5). Total num frames: 591265792. Throughput: 0: 42500.5. Samples: 473514540. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 16:05:58,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 16:06:00,573][00497] Updated weights for policy 0, policy_version 36093 (0.0017) [2024-03-29 16:06:02,442][00476] Signal inference workers to stop experience collection... (16900 times) [2024-03-29 16:06:02,478][00497] InferenceWorker_p0-w0: stopping experience collection (16900 times) [2024-03-29 16:06:02,663][00476] Signal inference workers to resume experience collection... (16900 times) [2024-03-29 16:06:02,664][00497] InferenceWorker_p0-w0: resuming experience collection (16900 times) [2024-03-29 16:06:03,830][00497] Updated weights for policy 0, policy_version 36103 (0.0025) [2024-03-29 16:06:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 591511552. Throughput: 0: 42715.1. Samples: 473649080. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 16:06:03,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 16:06:07,652][00497] Updated weights for policy 0, policy_version 36113 (0.0018) [2024-03-29 16:06:08,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 591708160. Throughput: 0: 42555.1. Samples: 473884460. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 16:06:08,841][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 16:06:12,007][00497] Updated weights for policy 0, policy_version 36123 (0.0073) [2024-03-29 16:06:13,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 591904768. Throughput: 0: 42104.7. Samples: 474140500. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 16:06:13,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 16:06:16,047][00497] Updated weights for policy 0, policy_version 36133 (0.0020) [2024-03-29 16:06:18,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 592134144. Throughput: 0: 42295.7. Samples: 474276640. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 16:06:18,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:06:19,304][00497] Updated weights for policy 0, policy_version 36143 (0.0019) [2024-03-29 16:06:23,360][00497] Updated weights for policy 0, policy_version 36153 (0.0017) [2024-03-29 16:06:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 592347136. Throughput: 0: 42305.3. Samples: 474516760. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 16:06:23,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 16:06:27,324][00497] Updated weights for policy 0, policy_version 36163 (0.0018) [2024-03-29 16:06:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 592543744. Throughput: 0: 42376.1. Samples: 474782120. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 16:06:28,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:06:31,265][00497] Updated weights for policy 0, policy_version 36173 (0.0027) [2024-03-29 16:06:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 592773120. Throughput: 0: 42421.3. Samples: 474920880. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 16:06:33,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:06:34,522][00497] Updated weights for policy 0, policy_version 36183 (0.0032) [2024-03-29 16:06:38,247][00476] Signal inference workers to stop experience collection... (16950 times) [2024-03-29 16:06:38,247][00476] Signal inference workers to resume experience collection... (16950 times) [2024-03-29 16:06:38,296][00497] InferenceWorker_p0-w0: stopping experience collection (16950 times) [2024-03-29 16:06:38,296][00497] InferenceWorker_p0-w0: resuming experience collection (16950 times) [2024-03-29 16:06:38,584][00497] Updated weights for policy 0, policy_version 36193 (0.0022) [2024-03-29 16:06:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 42098.6). Total num frames: 592986112. Throughput: 0: 42531.5. Samples: 475167040. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 16:06:38,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 16:06:42,788][00497] Updated weights for policy 0, policy_version 36203 (0.0031) [2024-03-29 16:06:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 42154.1). Total num frames: 593199104. Throughput: 0: 42299.1. Samples: 475418000. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 16:06:43,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 16:06:46,811][00497] Updated weights for policy 0, policy_version 36213 (0.0022) [2024-03-29 16:06:48,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 593395712. Throughput: 0: 42392.5. Samples: 475556740. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 16:06:48,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 16:06:50,326][00497] Updated weights for policy 0, policy_version 36223 (0.0031) [2024-03-29 16:06:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 593608704. Throughput: 0: 42191.1. Samples: 475783060. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 16:06:53,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 16:06:54,343][00497] Updated weights for policy 0, policy_version 36233 (0.0023) [2024-03-29 16:06:58,521][00497] Updated weights for policy 0, policy_version 36243 (0.0022) [2024-03-29 16:06:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42098.6). Total num frames: 593821696. Throughput: 0: 42307.6. Samples: 476044340. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 16:06:58,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 16:07:02,543][00497] Updated weights for policy 0, policy_version 36253 (0.0018) [2024-03-29 16:07:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 594018304. Throughput: 0: 42207.4. Samples: 476175980. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 16:07:03,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 16:07:04,107][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000036257_594034688.pth... [2024-03-29 16:07:04,452][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035640_583925760.pth [2024-03-29 16:07:06,150][00497] Updated weights for policy 0, policy_version 36263 (0.0028) [2024-03-29 16:07:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 594231296. Throughput: 0: 41929.3. Samples: 476403580. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 16:07:08,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:07:09,950][00497] Updated weights for policy 0, policy_version 36273 (0.0026) [2024-03-29 16:07:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 594427904. Throughput: 0: 41920.9. Samples: 476668560. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 16:07:13,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:07:13,870][00476] Signal inference workers to stop experience collection... (17000 times) [2024-03-29 16:07:13,872][00476] Signal inference workers to resume experience collection... (17000 times) [2024-03-29 16:07:13,916][00497] InferenceWorker_p0-w0: stopping experience collection (17000 times) [2024-03-29 16:07:13,917][00497] InferenceWorker_p0-w0: resuming experience collection (17000 times) [2024-03-29 16:07:14,195][00497] Updated weights for policy 0, policy_version 36283 (0.0021) [2024-03-29 16:07:18,239][00497] Updated weights for policy 0, policy_version 36293 (0.0028) [2024-03-29 16:07:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 594624512. Throughput: 0: 41775.9. Samples: 476800800. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 16:07:18,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:07:21,665][00497] Updated weights for policy 0, policy_version 36303 (0.0025) [2024-03-29 16:07:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 594853888. Throughput: 0: 41533.0. Samples: 477036020. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 16:07:23,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 16:07:25,686][00497] Updated weights for policy 0, policy_version 36313 (0.0023) [2024-03-29 16:07:28,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 595066880. Throughput: 0: 41951.1. Samples: 477305800. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 16:07:28,841][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 16:07:30,158][00497] Updated weights for policy 0, policy_version 36324 (0.0023) [2024-03-29 16:07:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 595263488. Throughput: 0: 41524.0. Samples: 477425320. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 16:07:33,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:07:34,442][00497] Updated weights for policy 0, policy_version 36334 (0.0034) [2024-03-29 16:07:37,541][00497] Updated weights for policy 0, policy_version 36344 (0.0024) [2024-03-29 16:07:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 595492864. Throughput: 0: 41922.7. Samples: 477669580. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 16:07:38,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 16:07:41,742][00497] Updated weights for policy 0, policy_version 36354 (0.0024) [2024-03-29 16:07:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 595689472. Throughput: 0: 42042.8. Samples: 477936260. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 16:07:43,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 16:07:45,644][00476] Signal inference workers to stop experience collection... (17050 times) [2024-03-29 16:07:45,644][00476] Signal inference workers to resume experience collection... (17050 times) [2024-03-29 16:07:45,665][00497] Updated weights for policy 0, policy_version 36364 (0.0023) [2024-03-29 16:07:45,685][00497] InferenceWorker_p0-w0: stopping experience collection (17050 times) [2024-03-29 16:07:45,685][00497] InferenceWorker_p0-w0: resuming experience collection (17050 times) [2024-03-29 16:07:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 595902464. Throughput: 0: 41851.2. Samples: 478059280. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 16:07:48,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 16:07:49,951][00497] Updated weights for policy 0, policy_version 36374 (0.0022) [2024-03-29 16:07:53,096][00497] Updated weights for policy 0, policy_version 36384 (0.0026) [2024-03-29 16:07:53,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 596148224. Throughput: 0: 42407.2. Samples: 478311900. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 16:07:53,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 16:07:57,472][00497] Updated weights for policy 0, policy_version 36394 (0.0024) [2024-03-29 16:07:58,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 596312064. Throughput: 0: 41858.6. Samples: 478552200. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 16:07:58,842][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:08:01,548][00497] Updated weights for policy 0, policy_version 36404 (0.0028) [2024-03-29 16:08:03,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 596508672. Throughput: 0: 41627.5. Samples: 478674040. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 16:08:03,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 16:08:06,150][00497] Updated weights for policy 0, policy_version 36414 (0.0021) [2024-03-29 16:08:08,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 596754432. Throughput: 0: 41955.0. Samples: 478924000. Policy #0 lag: (min: 0.0, avg: 20.4, max: 45.0) [2024-03-29 16:08:08,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 16:08:09,127][00497] Updated weights for policy 0, policy_version 36424 (0.0033) [2024-03-29 16:08:13,660][00497] Updated weights for policy 0, policy_version 36434 (0.0025) [2024-03-29 16:08:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 596934656. Throughput: 0: 41162.2. Samples: 479158100. Policy #0 lag: (min: 0.0, avg: 20.4, max: 45.0) [2024-03-29 16:08:13,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 16:08:17,529][00497] Updated weights for policy 0, policy_version 36444 (0.0023) [2024-03-29 16:08:18,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 597131264. Throughput: 0: 41565.8. Samples: 479295780. Policy #0 lag: (min: 0.0, avg: 20.4, max: 45.0) [2024-03-29 16:08:18,841][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 16:08:21,905][00497] Updated weights for policy 0, policy_version 36454 (0.0017) [2024-03-29 16:08:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 597360640. Throughput: 0: 41688.1. Samples: 479545540. Policy #0 lag: (min: 0.0, avg: 20.4, max: 45.0) [2024-03-29 16:08:23,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 16:08:24,465][00476] Signal inference workers to stop experience collection... (17100 times) [2024-03-29 16:08:24,507][00497] InferenceWorker_p0-w0: stopping experience collection (17100 times) [2024-03-29 16:08:24,671][00476] Signal inference workers to resume experience collection... (17100 times) [2024-03-29 16:08:24,672][00497] InferenceWorker_p0-w0: resuming experience collection (17100 times) [2024-03-29 16:08:24,916][00497] Updated weights for policy 0, policy_version 36464 (0.0026) [2024-03-29 16:08:28,841][00126] Fps is (10 sec: 44230.4, 60 sec: 41778.2, 300 sec: 42042.8). Total num frames: 597573632. Throughput: 0: 41110.2. Samples: 479786280. Policy #0 lag: (min: 0.0, avg: 20.4, max: 45.0) [2024-03-29 16:08:28,841][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 16:08:29,442][00497] Updated weights for policy 0, policy_version 36474 (0.0025) [2024-03-29 16:08:33,252][00497] Updated weights for policy 0, policy_version 36484 (0.0026) [2024-03-29 16:08:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 597770240. Throughput: 0: 41384.3. Samples: 479921580. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 16:08:33,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 16:08:37,642][00497] Updated weights for policy 0, policy_version 36494 (0.0018) [2024-03-29 16:08:38,839][00126] Fps is (10 sec: 39327.1, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 597966848. Throughput: 0: 41328.9. Samples: 480171700. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 16:08:38,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:08:40,779][00497] Updated weights for policy 0, policy_version 36504 (0.0028) [2024-03-29 16:08:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 598196224. Throughput: 0: 41293.3. Samples: 480410400. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 16:08:43,841][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 16:08:45,223][00497] Updated weights for policy 0, policy_version 36514 (0.0034) [2024-03-29 16:08:48,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 598392832. Throughput: 0: 41398.0. Samples: 480536940. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 16:08:48,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:08:49,136][00497] Updated weights for policy 0, policy_version 36524 (0.0027) [2024-03-29 16:08:53,423][00497] Updated weights for policy 0, policy_version 36534 (0.0020) [2024-03-29 16:08:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40686.9, 300 sec: 41931.9). Total num frames: 598589440. Throughput: 0: 41536.0. Samples: 480793120. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 16:08:53,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 16:08:56,599][00497] Updated weights for policy 0, policy_version 36544 (0.0031) [2024-03-29 16:08:56,904][00476] Signal inference workers to stop experience collection... (17150 times) [2024-03-29 16:08:56,905][00476] Signal inference workers to resume experience collection... (17150 times) [2024-03-29 16:08:56,950][00497] InferenceWorker_p0-w0: stopping experience collection (17150 times) [2024-03-29 16:08:56,950][00497] InferenceWorker_p0-w0: resuming experience collection (17150 times) [2024-03-29 16:08:58,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 598818816. Throughput: 0: 41672.5. Samples: 481033360. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 16:08:58,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 16:09:00,699][00497] Updated weights for policy 0, policy_version 36554 (0.0024) [2024-03-29 16:09:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 598999040. Throughput: 0: 41448.9. Samples: 481160980. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 16:09:03,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:09:04,150][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000036562_599031808.pth... [2024-03-29 16:09:04,446][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000035948_588972032.pth [2024-03-29 16:09:05,034][00497] Updated weights for policy 0, policy_version 36564 (0.0024) [2024-03-29 16:09:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 40960.0, 300 sec: 41987.5). Total num frames: 599212032. Throughput: 0: 41622.1. Samples: 481418540. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 16:09:08,840][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 16:09:09,170][00497] Updated weights for policy 0, policy_version 36574 (0.0021) [2024-03-29 16:09:12,291][00497] Updated weights for policy 0, policy_version 36584 (0.0025) [2024-03-29 16:09:13,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 599441408. Throughput: 0: 41459.0. Samples: 481651880. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 16:09:13,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 16:09:16,453][00497] Updated weights for policy 0, policy_version 36594 (0.0030) [2024-03-29 16:09:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 599621632. Throughput: 0: 41311.1. Samples: 481780580. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 16:09:18,841][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 16:09:20,604][00497] Updated weights for policy 0, policy_version 36604 (0.0021) [2024-03-29 16:09:23,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 599834624. Throughput: 0: 41627.6. Samples: 482044940. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 16:09:23,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:09:24,633][00497] Updated weights for policy 0, policy_version 36614 (0.0019) [2024-03-29 16:09:27,751][00476] Signal inference workers to stop experience collection... (17200 times) [2024-03-29 16:09:27,832][00497] InferenceWorker_p0-w0: stopping experience collection (17200 times) [2024-03-29 16:09:27,834][00476] Signal inference workers to resume experience collection... (17200 times) [2024-03-29 16:09:27,839][00497] Updated weights for policy 0, policy_version 36624 (0.0032) [2024-03-29 16:09:27,860][00497] InferenceWorker_p0-w0: resuming experience collection (17200 times) [2024-03-29 16:09:28,839][00126] Fps is (10 sec: 45875.6, 60 sec: 41780.2, 300 sec: 41931.9). Total num frames: 600080384. Throughput: 0: 41602.7. Samples: 482282520. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 16:09:28,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 16:09:32,037][00497] Updated weights for policy 0, policy_version 36634 (0.0027) [2024-03-29 16:09:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 600260608. Throughput: 0: 41670.0. Samples: 482412100. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 16:09:33,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 16:09:36,018][00497] Updated weights for policy 0, policy_version 36644 (0.0031) [2024-03-29 16:09:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 600473600. Throughput: 0: 41948.4. Samples: 482680800. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 16:09:38,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:09:39,946][00497] Updated weights for policy 0, policy_version 36654 (0.0024) [2024-03-29 16:09:43,337][00497] Updated weights for policy 0, policy_version 36664 (0.0026) [2024-03-29 16:09:43,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 600719360. Throughput: 0: 41750.2. Samples: 482912120. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 16:09:43,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:09:47,585][00497] Updated weights for policy 0, policy_version 36674 (0.0020) [2024-03-29 16:09:48,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 600915968. Throughput: 0: 41915.1. Samples: 483047160. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 16:09:48,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 16:09:51,706][00497] Updated weights for policy 0, policy_version 36684 (0.0021) [2024-03-29 16:09:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 601096192. Throughput: 0: 41832.1. Samples: 483300980. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 16:09:53,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 16:09:55,737][00497] Updated weights for policy 0, policy_version 36694 (0.0024) [2024-03-29 16:09:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 601341952. Throughput: 0: 42275.7. Samples: 483554280. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 16:09:58,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 16:09:59,054][00497] Updated weights for policy 0, policy_version 36704 (0.0021) [2024-03-29 16:10:03,237][00497] Updated weights for policy 0, policy_version 36714 (0.0018) [2024-03-29 16:10:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 601538560. Throughput: 0: 42228.9. Samples: 483680880. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 16:10:03,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 16:10:07,399][00497] Updated weights for policy 0, policy_version 36724 (0.0021) [2024-03-29 16:10:08,657][00476] Signal inference workers to stop experience collection... (17250 times) [2024-03-29 16:10:08,693][00497] InferenceWorker_p0-w0: stopping experience collection (17250 times) [2024-03-29 16:10:08,821][00476] Signal inference workers to resume experience collection... (17250 times) [2024-03-29 16:10:08,821][00497] InferenceWorker_p0-w0: resuming experience collection (17250 times) [2024-03-29 16:10:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 601735168. Throughput: 0: 41884.5. Samples: 483929740. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:10:08,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 16:10:11,614][00497] Updated weights for policy 0, policy_version 36734 (0.0019) [2024-03-29 16:10:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 601948160. Throughput: 0: 42212.9. Samples: 484182100. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:10:13,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 16:10:15,029][00497] Updated weights for policy 0, policy_version 36744 (0.0017) [2024-03-29 16:10:18,839][00126] Fps is (10 sec: 42597.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 602161152. Throughput: 0: 41680.8. Samples: 484287740. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:10:18,842][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 16:10:19,028][00497] Updated weights for policy 0, policy_version 36754 (0.0024) [2024-03-29 16:10:23,261][00497] Updated weights for policy 0, policy_version 36764 (0.0026) [2024-03-29 16:10:23,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 602357760. Throughput: 0: 41888.9. Samples: 484565800. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:10:23,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 16:10:27,239][00497] Updated weights for policy 0, policy_version 36774 (0.0029) [2024-03-29 16:10:28,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 602570752. Throughput: 0: 42146.7. Samples: 484808720. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:10:28,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:10:30,823][00497] Updated weights for policy 0, policy_version 36784 (0.0022) [2024-03-29 16:10:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 602800128. Throughput: 0: 41524.9. Samples: 484915780. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 16:10:33,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 16:10:34,805][00497] Updated weights for policy 0, policy_version 36794 (0.0023) [2024-03-29 16:10:38,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 602963968. Throughput: 0: 41683.5. Samples: 485176740. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 16:10:38,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 16:10:39,154][00497] Updated weights for policy 0, policy_version 36804 (0.0020) [2024-03-29 16:10:43,345][00497] Updated weights for policy 0, policy_version 36814 (0.0024) [2024-03-29 16:10:43,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.0, 300 sec: 41709.8). Total num frames: 603176960. Throughput: 0: 41416.8. Samples: 485418040. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 16:10:43,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:10:44,924][00476] Signal inference workers to stop experience collection... (17300 times) [2024-03-29 16:10:44,957][00497] InferenceWorker_p0-w0: stopping experience collection (17300 times) [2024-03-29 16:10:45,134][00476] Signal inference workers to resume experience collection... (17300 times) [2024-03-29 16:10:45,134][00497] InferenceWorker_p0-w0: resuming experience collection (17300 times) [2024-03-29 16:10:46,972][00497] Updated weights for policy 0, policy_version 36824 (0.0024) [2024-03-29 16:10:48,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 603389952. Throughput: 0: 40872.0. Samples: 485520120. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 16:10:48,841][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:10:50,923][00497] Updated weights for policy 0, policy_version 36834 (0.0028) [2024-03-29 16:10:53,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40959.9, 300 sec: 41654.2). Total num frames: 603553792. Throughput: 0: 41085.6. Samples: 485778600. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 16:10:53,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:10:55,515][00497] Updated weights for policy 0, policy_version 36844 (0.0028) [2024-03-29 16:10:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40686.9, 300 sec: 41598.7). Total num frames: 603783168. Throughput: 0: 40953.3. Samples: 486025000. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:10:58,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 16:10:59,426][00497] Updated weights for policy 0, policy_version 36854 (0.0019) [2024-03-29 16:11:02,896][00497] Updated weights for policy 0, policy_version 36864 (0.0029) [2024-03-29 16:11:03,839][00126] Fps is (10 sec: 47513.5, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 604028928. Throughput: 0: 41418.2. Samples: 486151560. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:03,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 16:11:04,049][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000036868_604045312.pth... [2024-03-29 16:11:04,374][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000036257_594034688.pth [2024-03-29 16:11:06,637][00497] Updated weights for policy 0, policy_version 36874 (0.0020) [2024-03-29 16:11:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 604209152. Throughput: 0: 40475.6. Samples: 486387200. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:08,841][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 16:11:11,622][00497] Updated weights for policy 0, policy_version 36884 (0.0018) [2024-03-29 16:11:13,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 604405760. Throughput: 0: 40700.0. Samples: 486640220. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:13,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 16:11:15,589][00497] Updated weights for policy 0, policy_version 36894 (0.0027) [2024-03-29 16:11:18,839][00126] Fps is (10 sec: 40960.6, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 604618752. Throughput: 0: 40945.4. Samples: 486758320. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 16:11:18,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:11:19,110][00497] Updated weights for policy 0, policy_version 36904 (0.0024) [2024-03-29 16:11:22,981][00497] Updated weights for policy 0, policy_version 36914 (0.0021) [2024-03-29 16:11:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40686.9, 300 sec: 41543.1). Total num frames: 604798976. Throughput: 0: 40528.0. Samples: 487000500. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 16:11:23,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:11:24,259][00476] Signal inference workers to stop experience collection... (17350 times) [2024-03-29 16:11:24,292][00497] InferenceWorker_p0-w0: stopping experience collection (17350 times) [2024-03-29 16:11:24,463][00476] Signal inference workers to resume experience collection... (17350 times) [2024-03-29 16:11:24,463][00497] InferenceWorker_p0-w0: resuming experience collection (17350 times) [2024-03-29 16:11:27,750][00497] Updated weights for policy 0, policy_version 36924 (0.0022) [2024-03-29 16:11:28,839][00126] Fps is (10 sec: 39320.9, 60 sec: 40686.9, 300 sec: 41487.6). Total num frames: 605011968. Throughput: 0: 40788.4. Samples: 487253520. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 16:11:28,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 16:11:31,591][00497] Updated weights for policy 0, policy_version 36934 (0.0030) [2024-03-29 16:11:33,839][00126] Fps is (10 sec: 44237.2, 60 sec: 40686.9, 300 sec: 41543.2). Total num frames: 605241344. Throughput: 0: 41419.1. Samples: 487383980. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 16:11:33,841][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 16:11:34,900][00497] Updated weights for policy 0, policy_version 36944 (0.0024) [2024-03-29 16:11:38,730][00497] Updated weights for policy 0, policy_version 36954 (0.0029) [2024-03-29 16:11:38,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 605454336. Throughput: 0: 40876.5. Samples: 487618040. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 16:11:38,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:11:43,513][00497] Updated weights for policy 0, policy_version 36964 (0.0019) [2024-03-29 16:11:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 605634560. Throughput: 0: 41276.0. Samples: 487882420. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:43,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 16:11:47,383][00497] Updated weights for policy 0, policy_version 36974 (0.0033) [2024-03-29 16:11:48,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40687.0, 300 sec: 41432.1). Total num frames: 605831168. Throughput: 0: 41045.9. Samples: 487998620. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:48,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:11:50,940][00497] Updated weights for policy 0, policy_version 36984 (0.0033) [2024-03-29 16:11:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41487.6). Total num frames: 606060544. Throughput: 0: 40690.8. Samples: 488218280. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:53,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 16:11:54,851][00497] Updated weights for policy 0, policy_version 36994 (0.0025) [2024-03-29 16:11:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41432.1). Total num frames: 606240768. Throughput: 0: 41179.2. Samples: 488493280. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:11:58,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:11:59,307][00476] Signal inference workers to stop experience collection... (17400 times) [2024-03-29 16:11:59,350][00497] InferenceWorker_p0-w0: stopping experience collection (17400 times) [2024-03-29 16:11:59,529][00476] Signal inference workers to resume experience collection... (17400 times) [2024-03-29 16:11:59,530][00497] InferenceWorker_p0-w0: resuming experience collection (17400 times) [2024-03-29 16:11:59,533][00497] Updated weights for policy 0, policy_version 37004 (0.0029) [2024-03-29 16:12:03,409][00497] Updated weights for policy 0, policy_version 37014 (0.0031) [2024-03-29 16:12:03,839][00126] Fps is (10 sec: 39320.9, 60 sec: 40413.9, 300 sec: 41432.1). Total num frames: 606453760. Throughput: 0: 41290.1. Samples: 488616380. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:12:03,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 16:12:06,906][00497] Updated weights for policy 0, policy_version 37024 (0.0021) [2024-03-29 16:12:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 606666752. Throughput: 0: 40997.9. Samples: 488845400. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 16:12:08,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 16:12:10,697][00497] Updated weights for policy 0, policy_version 37034 (0.0027) [2024-03-29 16:12:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 606846976. Throughput: 0: 41063.2. Samples: 489101360. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 16:12:13,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:12:15,553][00497] Updated weights for policy 0, policy_version 37044 (0.0024) [2024-03-29 16:12:18,839][00126] Fps is (10 sec: 39321.2, 60 sec: 40686.8, 300 sec: 41376.5). Total num frames: 607059968. Throughput: 0: 41049.3. Samples: 489231200. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 16:12:18,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 16:12:19,320][00497] Updated weights for policy 0, policy_version 37054 (0.0025) [2024-03-29 16:12:22,809][00497] Updated weights for policy 0, policy_version 37064 (0.0024) [2024-03-29 16:12:23,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.3, 300 sec: 41487.6). Total num frames: 607305728. Throughput: 0: 41148.9. Samples: 489469740. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 16:12:23,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 16:12:26,763][00497] Updated weights for policy 0, policy_version 37074 (0.0024) [2024-03-29 16:12:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 607485952. Throughput: 0: 40980.8. Samples: 489726560. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 16:12:28,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 16:12:31,363][00497] Updated weights for policy 0, policy_version 37084 (0.0022) [2024-03-29 16:12:31,397][00476] Signal inference workers to stop experience collection... (17450 times) [2024-03-29 16:12:31,415][00497] InferenceWorker_p0-w0: stopping experience collection (17450 times) [2024-03-29 16:12:31,611][00476] Signal inference workers to resume experience collection... (17450 times) [2024-03-29 16:12:31,611][00497] InferenceWorker_p0-w0: resuming experience collection (17450 times) [2024-03-29 16:12:33,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40687.0, 300 sec: 41321.0). Total num frames: 607682560. Throughput: 0: 41493.8. Samples: 489865840. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 16:12:33,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 16:12:35,087][00497] Updated weights for policy 0, policy_version 37094 (0.0022) [2024-03-29 16:12:38,712][00497] Updated weights for policy 0, policy_version 37104 (0.0034) [2024-03-29 16:12:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 40960.0, 300 sec: 41432.1). Total num frames: 607911936. Throughput: 0: 41837.2. Samples: 490100960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 16:12:38,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:12:42,682][00497] Updated weights for policy 0, policy_version 37114 (0.0022) [2024-03-29 16:12:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41321.0). Total num frames: 608092160. Throughput: 0: 40976.0. Samples: 490337200. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 16:12:43,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 16:12:47,149][00497] Updated weights for policy 0, policy_version 37124 (0.0021) [2024-03-29 16:12:48,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41265.5). Total num frames: 608321536. Throughput: 0: 41421.5. Samples: 490480340. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 16:12:48,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 16:12:50,781][00497] Updated weights for policy 0, policy_version 37134 (0.0026) [2024-03-29 16:12:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41233.0, 300 sec: 41432.1). Total num frames: 608534528. Throughput: 0: 41640.4. Samples: 490719220. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 16:12:53,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:12:54,177][00497] Updated weights for policy 0, policy_version 37144 (0.0024) [2024-03-29 16:12:57,994][00497] Updated weights for policy 0, policy_version 37154 (0.0027) [2024-03-29 16:12:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41432.1). Total num frames: 608731136. Throughput: 0: 41530.3. Samples: 490970220. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 16:12:58,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 16:13:02,682][00497] Updated weights for policy 0, policy_version 37164 (0.0025) [2024-03-29 16:13:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41321.0). Total num frames: 608944128. Throughput: 0: 41854.3. Samples: 491114640. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:03,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 16:13:04,157][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000037169_608976896.pth... [2024-03-29 16:13:04,471][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000036562_599031808.pth [2024-03-29 16:13:06,227][00476] Signal inference workers to stop experience collection... (17500 times) [2024-03-29 16:13:06,228][00476] Signal inference workers to resume experience collection... (17500 times) [2024-03-29 16:13:06,264][00497] InferenceWorker_p0-w0: stopping experience collection (17500 times) [2024-03-29 16:13:06,265][00497] InferenceWorker_p0-w0: resuming experience collection (17500 times) [2024-03-29 16:13:06,527][00497] Updated weights for policy 0, policy_version 37174 (0.0025) [2024-03-29 16:13:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 609140736. Throughput: 0: 41800.0. Samples: 491350740. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:08,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:13:09,989][00497] Updated weights for policy 0, policy_version 37184 (0.0034) [2024-03-29 16:13:13,840][00126] Fps is (10 sec: 42596.2, 60 sec: 42052.0, 300 sec: 41487.6). Total num frames: 609370112. Throughput: 0: 41284.1. Samples: 491584360. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:13,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:13:14,020][00497] Updated weights for policy 0, policy_version 37194 (0.0021) [2024-03-29 16:13:18,693][00497] Updated weights for policy 0, policy_version 37204 (0.0025) [2024-03-29 16:13:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41321.0). Total num frames: 609550336. Throughput: 0: 41231.5. Samples: 491721260. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 16:13:18,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 16:13:22,456][00497] Updated weights for policy 0, policy_version 37214 (0.0022) [2024-03-29 16:13:23,839][00126] Fps is (10 sec: 40961.8, 60 sec: 41233.0, 300 sec: 41376.7). Total num frames: 609779712. Throughput: 0: 41569.8. Samples: 491971600. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 16:13:23,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:13:25,960][00497] Updated weights for policy 0, policy_version 37224 (0.0022) [2024-03-29 16:13:28,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 609992704. Throughput: 0: 41499.0. Samples: 492204660. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 16:13:28,842][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 16:13:30,027][00497] Updated weights for policy 0, policy_version 37234 (0.0018) [2024-03-29 16:13:33,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41376.5). Total num frames: 610172928. Throughput: 0: 41422.2. Samples: 492344340. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 16:13:33,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:13:34,524][00497] Updated weights for policy 0, policy_version 37244 (0.0029) [2024-03-29 16:13:36,133][00476] Signal inference workers to stop experience collection... (17550 times) [2024-03-29 16:13:36,176][00497] InferenceWorker_p0-w0: stopping experience collection (17550 times) [2024-03-29 16:13:36,365][00476] Signal inference workers to resume experience collection... (17550 times) [2024-03-29 16:13:36,366][00497] InferenceWorker_p0-w0: resuming experience collection (17550 times) [2024-03-29 16:13:38,148][00497] Updated weights for policy 0, policy_version 37254 (0.0024) [2024-03-29 16:13:38,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41233.2, 300 sec: 41321.0). Total num frames: 610385920. Throughput: 0: 41502.7. Samples: 492586840. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 16:13:38,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:13:41,881][00497] Updated weights for policy 0, policy_version 37264 (0.0026) [2024-03-29 16:13:43,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41376.5). Total num frames: 610598912. Throughput: 0: 40958.1. Samples: 492813340. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:43,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:13:46,058][00497] Updated weights for policy 0, policy_version 37274 (0.0020) [2024-03-29 16:13:48,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41233.0, 300 sec: 41376.5). Total num frames: 610795520. Throughput: 0: 40930.6. Samples: 492956520. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:48,841][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:13:50,467][00497] Updated weights for policy 0, policy_version 37284 (0.0025) [2024-03-29 16:13:53,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.0, 300 sec: 41321.0). Total num frames: 611008512. Throughput: 0: 41139.0. Samples: 493202000. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:53,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 16:13:54,083][00497] Updated weights for policy 0, policy_version 37294 (0.0030) [2024-03-29 16:13:57,433][00497] Updated weights for policy 0, policy_version 37304 (0.0031) [2024-03-29 16:13:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41487.6). Total num frames: 611237888. Throughput: 0: 41331.9. Samples: 493444280. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 16:13:58,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 16:14:01,590][00497] Updated weights for policy 0, policy_version 37314 (0.0018) [2024-03-29 16:14:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 611434496. Throughput: 0: 41335.0. Samples: 493581340. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 16:14:03,842][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:14:05,889][00497] Updated weights for policy 0, policy_version 37324 (0.0021) [2024-03-29 16:14:08,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 41376.6). Total num frames: 611647488. Throughput: 0: 41457.4. Samples: 493837180. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 16:14:08,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 16:14:09,680][00497] Updated weights for policy 0, policy_version 37334 (0.0036) [2024-03-29 16:14:09,776][00476] Signal inference workers to stop experience collection... (17600 times) [2024-03-29 16:14:09,857][00497] InferenceWorker_p0-w0: stopping experience collection (17600 times) [2024-03-29 16:14:09,950][00476] Signal inference workers to resume experience collection... (17600 times) [2024-03-29 16:14:09,950][00497] InferenceWorker_p0-w0: resuming experience collection (17600 times) [2024-03-29 16:14:13,028][00497] Updated weights for policy 0, policy_version 37344 (0.0026) [2024-03-29 16:14:13,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.5, 300 sec: 41543.2). Total num frames: 611876864. Throughput: 0: 41629.9. Samples: 494078000. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 16:14:13,840][00126] Avg episode reward: [(0, '0.496')] [2024-03-29 16:14:17,181][00497] Updated weights for policy 0, policy_version 37354 (0.0026) [2024-03-29 16:14:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 612057088. Throughput: 0: 41381.4. Samples: 494206500. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 16:14:18,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:14:21,563][00497] Updated weights for policy 0, policy_version 37364 (0.0022) [2024-03-29 16:14:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 41321.0). Total num frames: 612270080. Throughput: 0: 42012.4. Samples: 494477400. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 16:14:23,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 16:14:25,480][00497] Updated weights for policy 0, policy_version 37374 (0.0024) [2024-03-29 16:14:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41432.1). Total num frames: 612483072. Throughput: 0: 41981.5. Samples: 494702500. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 16:14:28,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:14:28,917][00497] Updated weights for policy 0, policy_version 37384 (0.0025) [2024-03-29 16:14:33,241][00497] Updated weights for policy 0, policy_version 37394 (0.0023) [2024-03-29 16:14:33,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41321.0). Total num frames: 612663296. Throughput: 0: 41467.6. Samples: 494822560. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 16:14:33,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 16:14:37,554][00497] Updated weights for policy 0, policy_version 37404 (0.0024) [2024-03-29 16:14:38,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41265.5). Total num frames: 612892672. Throughput: 0: 42128.0. Samples: 495097760. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 16:14:38,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 16:14:41,430][00497] Updated weights for policy 0, policy_version 37414 (0.0020) [2024-03-29 16:14:43,357][00476] Signal inference workers to stop experience collection... (17650 times) [2024-03-29 16:14:43,378][00497] InferenceWorker_p0-w0: stopping experience collection (17650 times) [2024-03-29 16:14:43,579][00476] Signal inference workers to resume experience collection... (17650 times) [2024-03-29 16:14:43,579][00497] InferenceWorker_p0-w0: resuming experience collection (17650 times) [2024-03-29 16:14:43,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.4, 300 sec: 41376.6). Total num frames: 613122048. Throughput: 0: 41988.2. Samples: 495333740. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 16:14:43,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:14:44,760][00497] Updated weights for policy 0, policy_version 37424 (0.0024) [2024-03-29 16:14:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41376.5). Total num frames: 613302272. Throughput: 0: 41404.5. Samples: 495444540. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 16:14:48,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:14:48,996][00497] Updated weights for policy 0, policy_version 37434 (0.0027) [2024-03-29 16:14:53,237][00497] Updated weights for policy 0, policy_version 37444 (0.0042) [2024-03-29 16:14:53,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41506.2, 300 sec: 41209.9). Total num frames: 613498880. Throughput: 0: 41661.3. Samples: 495711940. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 16:14:53,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 16:14:57,040][00497] Updated weights for policy 0, policy_version 37454 (0.0028) [2024-03-29 16:14:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.2, 300 sec: 41321.0). Total num frames: 613728256. Throughput: 0: 41790.7. Samples: 495958580. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:14:58,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 16:15:00,213][00497] Updated weights for policy 0, policy_version 37464 (0.0025) [2024-03-29 16:15:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 613941248. Throughput: 0: 41441.3. Samples: 496071360. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:15:03,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 16:15:04,079][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000037473_613957632.pth... [2024-03-29 16:15:04,393][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000036868_604045312.pth [2024-03-29 16:15:04,711][00497] Updated weights for policy 0, policy_version 37474 (0.0024) [2024-03-29 16:15:08,632][00497] Updated weights for policy 0, policy_version 37484 (0.0025) [2024-03-29 16:15:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41321.0). Total num frames: 614137856. Throughput: 0: 41556.0. Samples: 496347420. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:15:08,840][00126] Avg episode reward: [(0, '0.484')] [2024-03-29 16:15:12,757][00497] Updated weights for policy 0, policy_version 37494 (0.0023) [2024-03-29 16:15:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41321.0). Total num frames: 614350848. Throughput: 0: 42022.2. Samples: 496593500. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:15:13,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:15:15,473][00476] Signal inference workers to stop experience collection... (17700 times) [2024-03-29 16:15:15,546][00497] InferenceWorker_p0-w0: stopping experience collection (17700 times) [2024-03-29 16:15:15,561][00476] Signal inference workers to resume experience collection... (17700 times) [2024-03-29 16:15:15,579][00497] InferenceWorker_p0-w0: resuming experience collection (17700 times) [2024-03-29 16:15:15,830][00497] Updated weights for policy 0, policy_version 37504 (0.0033) [2024-03-29 16:15:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41432.1). Total num frames: 614580224. Throughput: 0: 41741.8. Samples: 496700940. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 16:15:18,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 16:15:20,336][00497] Updated weights for policy 0, policy_version 37514 (0.0020) [2024-03-29 16:15:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41265.5). Total num frames: 614744064. Throughput: 0: 41376.1. Samples: 496959680. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 16:15:23,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:15:24,587][00497] Updated weights for policy 0, policy_version 37524 (0.0026) [2024-03-29 16:15:28,530][00497] Updated weights for policy 0, policy_version 37534 (0.0027) [2024-03-29 16:15:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41233.1, 300 sec: 41209.9). Total num frames: 614957056. Throughput: 0: 41844.0. Samples: 497216720. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 16:15:28,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 16:15:31,595][00497] Updated weights for policy 0, policy_version 37544 (0.0025) [2024-03-29 16:15:33,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41487.7). Total num frames: 615202816. Throughput: 0: 41991.6. Samples: 497334160. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 16:15:33,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:15:36,381][00497] Updated weights for policy 0, policy_version 37554 (0.0023) [2024-03-29 16:15:38,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.2, 300 sec: 41376.5). Total num frames: 615383040. Throughput: 0: 41800.4. Samples: 497592960. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 16:15:38,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 16:15:40,360][00497] Updated weights for policy 0, policy_version 37564 (0.0022) [2024-03-29 16:15:43,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.0, 300 sec: 41376.6). Total num frames: 615596032. Throughput: 0: 41636.4. Samples: 497832220. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:15:43,840][00126] Avg episode reward: [(0, '0.506')] [2024-03-29 16:15:44,249][00497] Updated weights for policy 0, policy_version 37574 (0.0026) [2024-03-29 16:15:47,488][00497] Updated weights for policy 0, policy_version 37584 (0.0033) [2024-03-29 16:15:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 615825408. Throughput: 0: 41813.3. Samples: 497952960. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:15:48,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:15:50,163][00476] Signal inference workers to stop experience collection... (17750 times) [2024-03-29 16:15:50,191][00497] InferenceWorker_p0-w0: stopping experience collection (17750 times) [2024-03-29 16:15:50,349][00476] Signal inference workers to resume experience collection... (17750 times) [2024-03-29 16:15:50,350][00497] InferenceWorker_p0-w0: resuming experience collection (17750 times) [2024-03-29 16:15:52,071][00497] Updated weights for policy 0, policy_version 37594 (0.0021) [2024-03-29 16:15:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 616005632. Throughput: 0: 41446.2. Samples: 498212500. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:15:53,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:15:56,003][00497] Updated weights for policy 0, policy_version 37604 (0.0031) [2024-03-29 16:15:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.1, 300 sec: 41376.5). Total num frames: 616235008. Throughput: 0: 41708.4. Samples: 498470380. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:15:58,841][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:15:59,885][00497] Updated weights for policy 0, policy_version 37614 (0.0019) [2024-03-29 16:16:02,964][00497] Updated weights for policy 0, policy_version 37624 (0.0026) [2024-03-29 16:16:03,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42325.4, 300 sec: 41598.7). Total num frames: 616480768. Throughput: 0: 41908.9. Samples: 498586840. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:16:03,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 16:16:07,381][00497] Updated weights for policy 0, policy_version 37634 (0.0019) [2024-03-29 16:16:08,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 616644608. Throughput: 0: 41902.2. Samples: 498845280. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 16:16:08,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 16:16:11,560][00497] Updated weights for policy 0, policy_version 37644 (0.0019) [2024-03-29 16:16:13,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41779.1, 300 sec: 41487.6). Total num frames: 616857600. Throughput: 0: 41776.7. Samples: 499096680. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 16:16:13,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 16:16:15,452][00497] Updated weights for policy 0, policy_version 37654 (0.0027) [2024-03-29 16:16:18,619][00497] Updated weights for policy 0, policy_version 37664 (0.0031) [2024-03-29 16:16:18,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 617086976. Throughput: 0: 42113.2. Samples: 499229260. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 16:16:18,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 16:16:23,120][00497] Updated weights for policy 0, policy_version 37674 (0.0028) [2024-03-29 16:16:23,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 617267200. Throughput: 0: 41543.6. Samples: 499462420. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 16:16:23,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 16:16:24,026][00476] Signal inference workers to stop experience collection... (17800 times) [2024-03-29 16:16:24,026][00476] Signal inference workers to resume experience collection... (17800 times) [2024-03-29 16:16:24,073][00497] InferenceWorker_p0-w0: stopping experience collection (17800 times) [2024-03-29 16:16:24,074][00497] InferenceWorker_p0-w0: resuming experience collection (17800 times) [2024-03-29 16:16:27,077][00497] Updated weights for policy 0, policy_version 37684 (0.0029) [2024-03-29 16:16:28,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.2, 300 sec: 41487.6). Total num frames: 617480192. Throughput: 0: 42152.9. Samples: 499729100. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 16:16:28,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 16:16:31,042][00497] Updated weights for policy 0, policy_version 37694 (0.0020) [2024-03-29 16:16:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.1, 300 sec: 41543.2). Total num frames: 617709568. Throughput: 0: 42443.1. Samples: 499862900. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:16:33,841][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 16:16:34,209][00497] Updated weights for policy 0, policy_version 37704 (0.0022) [2024-03-29 16:16:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41543.1). Total num frames: 617889792. Throughput: 0: 41548.4. Samples: 500082180. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:16:38,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:16:38,881][00497] Updated weights for policy 0, policy_version 37714 (0.0022) [2024-03-29 16:16:42,926][00497] Updated weights for policy 0, policy_version 37724 (0.0023) [2024-03-29 16:16:43,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 618102784. Throughput: 0: 41857.0. Samples: 500353940. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:16:43,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:16:46,748][00497] Updated weights for policy 0, policy_version 37734 (0.0028) [2024-03-29 16:16:48,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 618332160. Throughput: 0: 42012.4. Samples: 500477400. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:16:48,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:16:50,023][00497] Updated weights for policy 0, policy_version 37744 (0.0028) [2024-03-29 16:16:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 618528768. Throughput: 0: 41759.1. Samples: 500724440. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 16:16:53,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 16:16:54,005][00476] Signal inference workers to stop experience collection... (17850 times) [2024-03-29 16:16:54,009][00476] Signal inference workers to resume experience collection... (17850 times) [2024-03-29 16:16:54,054][00497] InferenceWorker_p0-w0: stopping experience collection (17850 times) [2024-03-29 16:16:54,054][00497] InferenceWorker_p0-w0: resuming experience collection (17850 times) [2024-03-29 16:16:54,315][00497] Updated weights for policy 0, policy_version 37754 (0.0031) [2024-03-29 16:16:58,259][00497] Updated weights for policy 0, policy_version 37764 (0.0026) [2024-03-29 16:16:58,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 618741760. Throughput: 0: 42109.8. Samples: 500991620. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:16:58,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:17:02,162][00497] Updated weights for policy 0, policy_version 37774 (0.0023) [2024-03-29 16:17:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 618971136. Throughput: 0: 42003.2. Samples: 501119400. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:17:03,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 16:17:03,935][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000037780_618987520.pth... [2024-03-29 16:17:04,265][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000037169_608976896.pth [2024-03-29 16:17:05,415][00497] Updated weights for policy 0, policy_version 37784 (0.0031) [2024-03-29 16:17:08,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 619184128. Throughput: 0: 41987.1. Samples: 501351840. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:17:08,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:17:09,851][00497] Updated weights for policy 0, policy_version 37794 (0.0019) [2024-03-29 16:17:13,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 619364352. Throughput: 0: 41874.2. Samples: 501613440. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:17:13,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 16:17:13,907][00497] Updated weights for policy 0, policy_version 37804 (0.0018) [2024-03-29 16:17:17,742][00497] Updated weights for policy 0, policy_version 37814 (0.0019) [2024-03-29 16:17:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 619577344. Throughput: 0: 41819.6. Samples: 501744780. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:17:18,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:17:21,149][00497] Updated weights for policy 0, policy_version 37824 (0.0026) [2024-03-29 16:17:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 619806720. Throughput: 0: 41975.2. Samples: 501971060. Policy #0 lag: (min: 1.0, avg: 21.3, max: 44.0) [2024-03-29 16:17:23,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 16:17:25,561][00497] Updated weights for policy 0, policy_version 37834 (0.0027) [2024-03-29 16:17:28,638][00476] Signal inference workers to stop experience collection... (17900 times) [2024-03-29 16:17:28,698][00497] InferenceWorker_p0-w0: stopping experience collection (17900 times) [2024-03-29 16:17:28,733][00476] Signal inference workers to resume experience collection... (17900 times) [2024-03-29 16:17:28,735][00497] InferenceWorker_p0-w0: resuming experience collection (17900 times) [2024-03-29 16:17:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 620003328. Throughput: 0: 41922.7. Samples: 502240460. Policy #0 lag: (min: 1.0, avg: 21.3, max: 44.0) [2024-03-29 16:17:28,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 16:17:29,377][00497] Updated weights for policy 0, policy_version 37844 (0.0019) [2024-03-29 16:17:33,383][00497] Updated weights for policy 0, policy_version 37854 (0.0020) [2024-03-29 16:17:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 620216320. Throughput: 0: 42159.1. Samples: 502374560. Policy #0 lag: (min: 1.0, avg: 21.3, max: 44.0) [2024-03-29 16:17:33,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 16:17:36,404][00497] Updated weights for policy 0, policy_version 37864 (0.0030) [2024-03-29 16:17:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 620445696. Throughput: 0: 42028.5. Samples: 502615720. Policy #0 lag: (min: 1.0, avg: 21.3, max: 44.0) [2024-03-29 16:17:38,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 16:17:41,192][00497] Updated weights for policy 0, policy_version 37874 (0.0023) [2024-03-29 16:17:43,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 620642304. Throughput: 0: 42041.7. Samples: 502883500. Policy #0 lag: (min: 1.0, avg: 21.3, max: 44.0) [2024-03-29 16:17:43,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 16:17:44,737][00497] Updated weights for policy 0, policy_version 37884 (0.0024) [2024-03-29 16:17:48,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 620838912. Throughput: 0: 42156.1. Samples: 503016420. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 16:17:48,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 16:17:48,907][00497] Updated weights for policy 0, policy_version 37894 (0.0023) [2024-03-29 16:17:52,220][00497] Updated weights for policy 0, policy_version 37904 (0.0026) [2024-03-29 16:17:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 621084672. Throughput: 0: 42201.8. Samples: 503250920. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 16:17:53,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 16:17:56,673][00497] Updated weights for policy 0, policy_version 37914 (0.0024) [2024-03-29 16:17:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 621264896. Throughput: 0: 42257.8. Samples: 503515040. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 16:17:58,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 16:17:59,988][00476] Signal inference workers to stop experience collection... (17950 times) [2024-03-29 16:17:59,988][00476] Signal inference workers to resume experience collection... (17950 times) [2024-03-29 16:18:00,023][00497] InferenceWorker_p0-w0: stopping experience collection (17950 times) [2024-03-29 16:18:00,029][00497] InferenceWorker_p0-w0: resuming experience collection (17950 times) [2024-03-29 16:18:00,314][00497] Updated weights for policy 0, policy_version 37924 (0.0022) [2024-03-29 16:18:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 621477888. Throughput: 0: 42136.0. Samples: 503640900. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 16:18:03,841][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 16:18:04,410][00497] Updated weights for policy 0, policy_version 37934 (0.0030) [2024-03-29 16:18:07,601][00497] Updated weights for policy 0, policy_version 37944 (0.0028) [2024-03-29 16:18:08,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.3, 300 sec: 41876.5). Total num frames: 621723648. Throughput: 0: 42454.6. Samples: 503881520. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 16:18:08,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:18:12,096][00497] Updated weights for policy 0, policy_version 37954 (0.0018) [2024-03-29 16:18:13,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 621903872. Throughput: 0: 42235.0. Samples: 504141040. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 16:18:13,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:18:15,907][00497] Updated weights for policy 0, policy_version 37964 (0.0027) [2024-03-29 16:18:18,839][00126] Fps is (10 sec: 37683.4, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 622100480. Throughput: 0: 41934.2. Samples: 504261600. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 16:18:18,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:18:20,161][00497] Updated weights for policy 0, policy_version 37974 (0.0022) [2024-03-29 16:18:23,681][00497] Updated weights for policy 0, policy_version 37985 (0.0028) [2024-03-29 16:18:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 622346240. Throughput: 0: 42119.0. Samples: 504511080. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 16:18:23,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:18:28,194][00497] Updated weights for policy 0, policy_version 37995 (0.0018) [2024-03-29 16:18:28,843][00126] Fps is (10 sec: 42581.7, 60 sec: 42049.5, 300 sec: 41875.8). Total num frames: 622526464. Throughput: 0: 41736.5. Samples: 504761800. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 16:18:28,844][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:18:32,064][00497] Updated weights for policy 0, policy_version 38005 (0.0017) [2024-03-29 16:18:33,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 622739456. Throughput: 0: 41715.9. Samples: 504893640. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 16:18:33,842][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:18:36,115][00497] Updated weights for policy 0, policy_version 38015 (0.0018) [2024-03-29 16:18:38,839][00126] Fps is (10 sec: 44253.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 622968832. Throughput: 0: 42126.6. Samples: 505146620. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 16:18:38,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 16:18:39,238][00497] Updated weights for policy 0, policy_version 38025 (0.0025) [2024-03-29 16:18:41,881][00476] Signal inference workers to stop experience collection... (18000 times) [2024-03-29 16:18:41,882][00476] Signal inference workers to resume experience collection... (18000 times) [2024-03-29 16:18:41,918][00497] InferenceWorker_p0-w0: stopping experience collection (18000 times) [2024-03-29 16:18:41,918][00497] InferenceWorker_p0-w0: resuming experience collection (18000 times) [2024-03-29 16:18:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 623149056. Throughput: 0: 41569.8. Samples: 505385680. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 16:18:43,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 16:18:43,911][00497] Updated weights for policy 0, policy_version 38035 (0.0018) [2024-03-29 16:18:47,977][00497] Updated weights for policy 0, policy_version 38045 (0.0020) [2024-03-29 16:18:48,848][00126] Fps is (10 sec: 39286.0, 60 sec: 42045.8, 300 sec: 41875.1). Total num frames: 623362048. Throughput: 0: 41592.4. Samples: 505512940. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 16:18:48,849][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 16:18:51,943][00497] Updated weights for policy 0, policy_version 38055 (0.0022) [2024-03-29 16:18:53,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 623591424. Throughput: 0: 41972.9. Samples: 505770300. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 16:18:53,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:18:55,097][00497] Updated weights for policy 0, policy_version 38065 (0.0026) [2024-03-29 16:18:58,839][00126] Fps is (10 sec: 42637.5, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 623788032. Throughput: 0: 41580.1. Samples: 506012140. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 16:18:58,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 16:18:59,612][00497] Updated weights for policy 0, policy_version 38075 (0.0021) [2024-03-29 16:19:03,525][00497] Updated weights for policy 0, policy_version 38085 (0.0030) [2024-03-29 16:19:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 624001024. Throughput: 0: 41904.9. Samples: 506147320. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 16:19:03,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 16:19:03,855][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038086_624001024.pth... [2024-03-29 16:19:04,205][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000037473_613957632.pth [2024-03-29 16:19:07,488][00497] Updated weights for policy 0, policy_version 38095 (0.0032) [2024-03-29 16:19:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 624197632. Throughput: 0: 41875.2. Samples: 506395460. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 16:19:08,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 16:19:10,938][00497] Updated weights for policy 0, policy_version 38105 (0.0025) [2024-03-29 16:19:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 624410624. Throughput: 0: 41402.2. Samples: 506624740. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 16:19:13,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:19:15,256][00497] Updated weights for policy 0, policy_version 38115 (0.0018) [2024-03-29 16:19:18,849][00126] Fps is (10 sec: 40920.0, 60 sec: 41772.4, 300 sec: 41819.5). Total num frames: 624607232. Throughput: 0: 41399.5. Samples: 506757020. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 16:19:18,856][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 16:19:19,525][00497] Updated weights for policy 0, policy_version 38125 (0.0025) [2024-03-29 16:19:21,516][00476] Signal inference workers to stop experience collection... (18050 times) [2024-03-29 16:19:21,581][00497] InferenceWorker_p0-w0: stopping experience collection (18050 times) [2024-03-29 16:19:21,595][00476] Signal inference workers to resume experience collection... (18050 times) [2024-03-29 16:19:21,669][00497] InferenceWorker_p0-w0: resuming experience collection (18050 times) [2024-03-29 16:19:23,396][00497] Updated weights for policy 0, policy_version 38135 (0.0017) [2024-03-29 16:19:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 624820224. Throughput: 0: 41439.2. Samples: 507011380. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 16:19:23,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:19:26,591][00497] Updated weights for policy 0, policy_version 38145 (0.0032) [2024-03-29 16:19:28,839][00126] Fps is (10 sec: 42640.5, 60 sec: 41782.0, 300 sec: 41931.9). Total num frames: 625033216. Throughput: 0: 41507.2. Samples: 507253500. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 16:19:28,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 16:19:31,006][00497] Updated weights for policy 0, policy_version 38155 (0.0024) [2024-03-29 16:19:33,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 625229824. Throughput: 0: 41591.7. Samples: 507384180. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 16:19:33,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 16:19:35,274][00497] Updated weights for policy 0, policy_version 38165 (0.0032) [2024-03-29 16:19:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 625442816. Throughput: 0: 41428.9. Samples: 507634600. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 16:19:38,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:19:38,909][00497] Updated weights for policy 0, policy_version 38175 (0.0022) [2024-03-29 16:19:42,501][00497] Updated weights for policy 0, policy_version 38185 (0.0034) [2024-03-29 16:19:43,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 625672192. Throughput: 0: 41180.4. Samples: 507865260. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 16:19:43,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 16:19:46,993][00497] Updated weights for policy 0, policy_version 38195 (0.0025) [2024-03-29 16:19:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41239.3, 300 sec: 41820.8). Total num frames: 625836032. Throughput: 0: 41049.3. Samples: 507994540. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 16:19:48,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:19:51,128][00497] Updated weights for policy 0, policy_version 38205 (0.0017) [2024-03-29 16:19:53,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 626065408. Throughput: 0: 41349.4. Samples: 508256180. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 16:19:53,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:19:54,780][00497] Updated weights for policy 0, policy_version 38215 (0.0026) [2024-03-29 16:19:57,531][00476] Signal inference workers to stop experience collection... (18100 times) [2024-03-29 16:19:57,593][00497] InferenceWorker_p0-w0: stopping experience collection (18100 times) [2024-03-29 16:19:57,694][00476] Signal inference workers to resume experience collection... (18100 times) [2024-03-29 16:19:57,695][00497] InferenceWorker_p0-w0: resuming experience collection (18100 times) [2024-03-29 16:19:58,301][00497] Updated weights for policy 0, policy_version 38225 (0.0024) [2024-03-29 16:19:58,839][00126] Fps is (10 sec: 45875.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 626294784. Throughput: 0: 41673.9. Samples: 508500060. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 16:19:58,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 16:20:02,541][00497] Updated weights for policy 0, policy_version 38235 (0.0027) [2024-03-29 16:20:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.0, 300 sec: 41820.8). Total num frames: 626475008. Throughput: 0: 41655.7. Samples: 508631120. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 16:20:03,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 16:20:06,935][00497] Updated weights for policy 0, policy_version 38245 (0.0026) [2024-03-29 16:20:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 626688000. Throughput: 0: 41638.7. Samples: 508885120. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 16:20:08,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:20:10,564][00497] Updated weights for policy 0, policy_version 38255 (0.0022) [2024-03-29 16:20:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 626900992. Throughput: 0: 41333.3. Samples: 509113500. Policy #0 lag: (min: 0.0, avg: 21.4, max: 45.0) [2024-03-29 16:20:13,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 16:20:14,304][00497] Updated weights for policy 0, policy_version 38265 (0.0035) [2024-03-29 16:20:18,434][00497] Updated weights for policy 0, policy_version 38275 (0.0021) [2024-03-29 16:20:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41786.0, 300 sec: 41931.9). Total num frames: 627113984. Throughput: 0: 41280.4. Samples: 509241800. Policy #0 lag: (min: 0.0, avg: 21.4, max: 45.0) [2024-03-29 16:20:18,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 16:20:22,941][00497] Updated weights for policy 0, policy_version 38285 (0.0019) [2024-03-29 16:20:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 627310592. Throughput: 0: 41562.7. Samples: 509504920. Policy #0 lag: (min: 0.0, avg: 21.4, max: 45.0) [2024-03-29 16:20:23,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 16:20:26,659][00497] Updated weights for policy 0, policy_version 38295 (0.0026) [2024-03-29 16:20:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 627523584. Throughput: 0: 41748.0. Samples: 509743920. Policy #0 lag: (min: 0.0, avg: 21.4, max: 45.0) [2024-03-29 16:20:28,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 16:20:29,921][00476] Signal inference workers to stop experience collection... (18150 times) [2024-03-29 16:20:29,991][00497] InferenceWorker_p0-w0: stopping experience collection (18150 times) [2024-03-29 16:20:29,996][00476] Signal inference workers to resume experience collection... (18150 times) [2024-03-29 16:20:30,016][00497] InferenceWorker_p0-w0: resuming experience collection (18150 times) [2024-03-29 16:20:30,266][00497] Updated weights for policy 0, policy_version 38305 (0.0032) [2024-03-29 16:20:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 627720192. Throughput: 0: 41188.5. Samples: 509848020. Policy #0 lag: (min: 0.0, avg: 21.4, max: 45.0) [2024-03-29 16:20:33,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:20:34,547][00497] Updated weights for policy 0, policy_version 38315 (0.0019) [2024-03-29 16:20:38,839][00126] Fps is (10 sec: 37683.5, 60 sec: 40960.0, 300 sec: 41709.8). Total num frames: 627900416. Throughput: 0: 41381.3. Samples: 510118340. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 16:20:38,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 16:20:38,921][00497] Updated weights for policy 0, policy_version 38325 (0.0021) [2024-03-29 16:20:42,760][00497] Updated weights for policy 0, policy_version 38335 (0.0032) [2024-03-29 16:20:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40687.0, 300 sec: 41654.2). Total num frames: 628113408. Throughput: 0: 41021.7. Samples: 510346040. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 16:20:43,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:20:46,334][00497] Updated weights for policy 0, policy_version 38345 (0.0025) [2024-03-29 16:20:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 628326400. Throughput: 0: 40636.9. Samples: 510459780. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 16:20:48,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:20:51,051][00497] Updated weights for policy 0, policy_version 38355 (0.0022) [2024-03-29 16:20:53,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40413.8, 300 sec: 41543.2). Total num frames: 628490240. Throughput: 0: 40668.4. Samples: 510715200. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 16:20:53,842][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 16:20:55,400][00497] Updated weights for policy 0, policy_version 38365 (0.0026) [2024-03-29 16:20:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40413.8, 300 sec: 41487.6). Total num frames: 628719616. Throughput: 0: 40806.2. Samples: 510949780. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 16:20:58,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 16:20:58,849][00497] Updated weights for policy 0, policy_version 38375 (0.0022) [2024-03-29 16:21:02,626][00497] Updated weights for policy 0, policy_version 38385 (0.0028) [2024-03-29 16:21:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 628948992. Throughput: 0: 40782.2. Samples: 511077000. Policy #0 lag: (min: 2.0, avg: 21.3, max: 41.0) [2024-03-29 16:21:03,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 16:21:03,856][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038388_628948992.pth... [2024-03-29 16:21:04,180][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000037780_618987520.pth [2024-03-29 16:21:05,166][00476] Signal inference workers to stop experience collection... (18200 times) [2024-03-29 16:21:05,167][00476] Signal inference workers to resume experience collection... (18200 times) [2024-03-29 16:21:05,200][00497] InferenceWorker_p0-w0: stopping experience collection (18200 times) [2024-03-29 16:21:05,201][00497] InferenceWorker_p0-w0: resuming experience collection (18200 times) [2024-03-29 16:21:07,124][00497] Updated weights for policy 0, policy_version 38395 (0.0023) [2024-03-29 16:21:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40413.9, 300 sec: 41543.2). Total num frames: 629112832. Throughput: 0: 40143.5. Samples: 511311380. Policy #0 lag: (min: 2.0, avg: 21.3, max: 41.0) [2024-03-29 16:21:08,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 16:21:11,571][00497] Updated weights for policy 0, policy_version 38405 (0.0026) [2024-03-29 16:21:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40413.8, 300 sec: 41487.6). Total num frames: 629325824. Throughput: 0: 40140.4. Samples: 511550240. Policy #0 lag: (min: 2.0, avg: 21.3, max: 41.0) [2024-03-29 16:21:13,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:21:15,256][00497] Updated weights for policy 0, policy_version 38415 (0.0020) [2024-03-29 16:21:18,709][00497] Updated weights for policy 0, policy_version 38425 (0.0031) [2024-03-29 16:21:18,839][00126] Fps is (10 sec: 44237.0, 60 sec: 40687.0, 300 sec: 41654.2). Total num frames: 629555200. Throughput: 0: 40569.8. Samples: 511673660. Policy #0 lag: (min: 2.0, avg: 21.3, max: 41.0) [2024-03-29 16:21:18,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:21:23,709][00497] Updated weights for policy 0, policy_version 38435 (0.0021) [2024-03-29 16:21:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40140.8, 300 sec: 41487.6). Total num frames: 629719040. Throughput: 0: 39876.4. Samples: 511912780. Policy #0 lag: (min: 2.0, avg: 21.3, max: 41.0) [2024-03-29 16:21:23,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 16:21:27,874][00497] Updated weights for policy 0, policy_version 38445 (0.0021) [2024-03-29 16:21:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40140.9, 300 sec: 41432.1). Total num frames: 629932032. Throughput: 0: 40479.2. Samples: 512167600. Policy #0 lag: (min: 1.0, avg: 17.1, max: 41.0) [2024-03-29 16:21:28,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 16:21:31,459][00497] Updated weights for policy 0, policy_version 38455 (0.0018) [2024-03-29 16:21:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 630161408. Throughput: 0: 40684.9. Samples: 512290600. Policy #0 lag: (min: 1.0, avg: 17.1, max: 41.0) [2024-03-29 16:21:33,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 16:21:34,703][00497] Updated weights for policy 0, policy_version 38465 (0.0022) [2024-03-29 16:21:38,839][00126] Fps is (10 sec: 40959.7, 60 sec: 40686.9, 300 sec: 41487.6). Total num frames: 630341632. Throughput: 0: 40373.4. Samples: 512532000. Policy #0 lag: (min: 1.0, avg: 17.1, max: 41.0) [2024-03-29 16:21:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 16:21:39,739][00497] Updated weights for policy 0, policy_version 38475 (0.0020) [2024-03-29 16:21:43,137][00476] Signal inference workers to stop experience collection... (18250 times) [2024-03-29 16:21:43,156][00497] InferenceWorker_p0-w0: stopping experience collection (18250 times) [2024-03-29 16:21:43,324][00476] Signal inference workers to resume experience collection... (18250 times) [2024-03-29 16:21:43,325][00497] InferenceWorker_p0-w0: resuming experience collection (18250 times) [2024-03-29 16:21:43,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40140.8, 300 sec: 41321.0). Total num frames: 630521856. Throughput: 0: 41050.7. Samples: 512797060. Policy #0 lag: (min: 1.0, avg: 17.1, max: 41.0) [2024-03-29 16:21:43,848][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:21:43,938][00497] Updated weights for policy 0, policy_version 38485 (0.0026) [2024-03-29 16:21:47,293][00497] Updated weights for policy 0, policy_version 38495 (0.0022) [2024-03-29 16:21:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40413.9, 300 sec: 41432.1). Total num frames: 630751232. Throughput: 0: 40389.9. Samples: 512894540. Policy #0 lag: (min: 1.0, avg: 17.1, max: 41.0) [2024-03-29 16:21:48,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 16:21:50,790][00497] Updated weights for policy 0, policy_version 38505 (0.0029) [2024-03-29 16:21:53,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 630964224. Throughput: 0: 40476.4. Samples: 513132820. Policy #0 lag: (min: 1.0, avg: 23.9, max: 43.0) [2024-03-29 16:21:53,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:21:55,699][00497] Updated weights for policy 0, policy_version 38515 (0.0025) [2024-03-29 16:21:58,839][00126] Fps is (10 sec: 36044.9, 60 sec: 39867.8, 300 sec: 41154.4). Total num frames: 631111680. Throughput: 0: 41130.8. Samples: 513401120. Policy #0 lag: (min: 1.0, avg: 23.9, max: 43.0) [2024-03-29 16:21:58,841][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 16:22:00,176][00497] Updated weights for policy 0, policy_version 38525 (0.0027) [2024-03-29 16:22:03,287][00497] Updated weights for policy 0, policy_version 38535 (0.0016) [2024-03-29 16:22:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40413.9, 300 sec: 41321.0). Total num frames: 631373824. Throughput: 0: 40688.9. Samples: 513504660. Policy #0 lag: (min: 1.0, avg: 23.9, max: 43.0) [2024-03-29 16:22:03,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 16:22:06,845][00497] Updated weights for policy 0, policy_version 38545 (0.0022) [2024-03-29 16:22:08,839][00126] Fps is (10 sec: 47513.6, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 631586816. Throughput: 0: 40612.1. Samples: 513740320. Policy #0 lag: (min: 1.0, avg: 23.9, max: 43.0) [2024-03-29 16:22:08,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 16:22:11,859][00497] Updated weights for policy 0, policy_version 38555 (0.0022) [2024-03-29 16:22:13,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40140.8, 300 sec: 41209.9). Total num frames: 631734272. Throughput: 0: 40800.8. Samples: 514003640. Policy #0 lag: (min: 1.0, avg: 23.9, max: 43.0) [2024-03-29 16:22:13,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:22:16,287][00497] Updated weights for policy 0, policy_version 38565 (0.0028) [2024-03-29 16:22:16,832][00476] Signal inference workers to stop experience collection... (18300 times) [2024-03-29 16:22:16,833][00476] Signal inference workers to resume experience collection... (18300 times) [2024-03-29 16:22:16,873][00497] InferenceWorker_p0-w0: stopping experience collection (18300 times) [2024-03-29 16:22:16,874][00497] InferenceWorker_p0-w0: resuming experience collection (18300 times) [2024-03-29 16:22:18,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40413.9, 300 sec: 41265.5). Total num frames: 631980032. Throughput: 0: 41029.4. Samples: 514136920. Policy #0 lag: (min: 1.0, avg: 17.6, max: 42.0) [2024-03-29 16:22:18,840][00126] Avg episode reward: [(0, '0.484')] [2024-03-29 16:22:19,696][00497] Updated weights for policy 0, policy_version 38575 (0.0022) [2024-03-29 16:22:23,243][00497] Updated weights for policy 0, policy_version 38585 (0.0023) [2024-03-29 16:22:23,839][00126] Fps is (10 sec: 47513.8, 60 sec: 41506.2, 300 sec: 41376.5). Total num frames: 632209408. Throughput: 0: 40428.5. Samples: 514351280. Policy #0 lag: (min: 1.0, avg: 17.6, max: 42.0) [2024-03-29 16:22:23,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 16:22:28,312][00497] Updated weights for policy 0, policy_version 38595 (0.0019) [2024-03-29 16:22:28,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40413.9, 300 sec: 41154.4). Total num frames: 632356864. Throughput: 0: 40316.9. Samples: 514611320. Policy #0 lag: (min: 1.0, avg: 17.6, max: 42.0) [2024-03-29 16:22:28,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:22:32,470][00497] Updated weights for policy 0, policy_version 38605 (0.0034) [2024-03-29 16:22:33,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40140.8, 300 sec: 41098.8). Total num frames: 632569856. Throughput: 0: 41313.3. Samples: 514753640. Policy #0 lag: (min: 1.0, avg: 17.6, max: 42.0) [2024-03-29 16:22:33,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 16:22:35,482][00497] Updated weights for policy 0, policy_version 38615 (0.0024) [2024-03-29 16:22:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 40960.0, 300 sec: 41209.9). Total num frames: 632799232. Throughput: 0: 40865.4. Samples: 514971760. Policy #0 lag: (min: 1.0, avg: 17.6, max: 42.0) [2024-03-29 16:22:38,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:22:39,101][00497] Updated weights for policy 0, policy_version 38625 (0.0021) [2024-03-29 16:22:43,839][00126] Fps is (10 sec: 39321.3, 60 sec: 40686.9, 300 sec: 41098.8). Total num frames: 632963072. Throughput: 0: 39950.1. Samples: 515198880. Policy #0 lag: (min: 0.0, avg: 23.0, max: 40.0) [2024-03-29 16:22:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 16:22:44,364][00497] Updated weights for policy 0, policy_version 38635 (0.0019) [2024-03-29 16:22:48,839][00126] Fps is (10 sec: 34406.3, 60 sec: 39867.7, 300 sec: 40876.7). Total num frames: 633143296. Throughput: 0: 40807.1. Samples: 515340980. Policy #0 lag: (min: 0.0, avg: 23.0, max: 40.0) [2024-03-29 16:22:48,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 16:22:49,014][00497] Updated weights for policy 0, policy_version 38645 (0.0021) [2024-03-29 16:22:50,043][00476] Signal inference workers to stop experience collection... (18350 times) [2024-03-29 16:22:50,074][00497] InferenceWorker_p0-w0: stopping experience collection (18350 times) [2024-03-29 16:22:50,256][00476] Signal inference workers to resume experience collection... (18350 times) [2024-03-29 16:22:50,256][00497] InferenceWorker_p0-w0: resuming experience collection (18350 times) [2024-03-29 16:22:51,898][00497] Updated weights for policy 0, policy_version 38655 (0.0029) [2024-03-29 16:22:53,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40140.8, 300 sec: 41043.3). Total num frames: 633372672. Throughput: 0: 40212.0. Samples: 515549860. Policy #0 lag: (min: 0.0, avg: 23.0, max: 40.0) [2024-03-29 16:22:53,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 16:22:55,837][00497] Updated weights for policy 0, policy_version 38665 (0.0023) [2024-03-29 16:22:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 40959.9, 300 sec: 40987.8). Total num frames: 633569280. Throughput: 0: 39810.2. Samples: 515795100. Policy #0 lag: (min: 0.0, avg: 23.0, max: 40.0) [2024-03-29 16:22:58,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 16:23:01,278][00497] Updated weights for policy 0, policy_version 38675 (0.0019) [2024-03-29 16:23:03,839][00126] Fps is (10 sec: 36044.7, 60 sec: 39321.6, 300 sec: 40710.1). Total num frames: 633733120. Throughput: 0: 40007.4. Samples: 515937260. Policy #0 lag: (min: 0.0, avg: 23.0, max: 40.0) [2024-03-29 16:23:03,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:23:04,124][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038681_633749504.pth... [2024-03-29 16:23:04,441][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038086_624001024.pth [2024-03-29 16:23:05,511][00497] Updated weights for policy 0, policy_version 38685 (0.0020) [2024-03-29 16:23:08,839][00126] Fps is (10 sec: 39322.0, 60 sec: 39594.7, 300 sec: 40876.7). Total num frames: 633962496. Throughput: 0: 39853.4. Samples: 516144680. Policy #0 lag: (min: 2.0, avg: 17.6, max: 40.0) [2024-03-29 16:23:08,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 16:23:08,997][00497] Updated weights for policy 0, policy_version 38695 (0.0021) [2024-03-29 16:23:12,499][00497] Updated weights for policy 0, policy_version 38705 (0.0024) [2024-03-29 16:23:13,839][00126] Fps is (10 sec: 45875.5, 60 sec: 40960.0, 300 sec: 40987.8). Total num frames: 634191872. Throughput: 0: 39363.6. Samples: 516382680. Policy #0 lag: (min: 2.0, avg: 17.6, max: 40.0) [2024-03-29 16:23:13,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 16:23:17,771][00497] Updated weights for policy 0, policy_version 38715 (0.0016) [2024-03-29 16:23:18,839][00126] Fps is (10 sec: 39321.4, 60 sec: 39594.6, 300 sec: 40710.1). Total num frames: 634355712. Throughput: 0: 39163.5. Samples: 516516000. Policy #0 lag: (min: 2.0, avg: 17.6, max: 40.0) [2024-03-29 16:23:18,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:23:21,727][00497] Updated weights for policy 0, policy_version 38725 (0.0019) [2024-03-29 16:23:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 39594.6, 300 sec: 40877.2). Total num frames: 634585088. Throughput: 0: 39975.1. Samples: 516770640. Policy #0 lag: (min: 2.0, avg: 17.6, max: 40.0) [2024-03-29 16:23:23,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 16:23:24,825][00497] Updated weights for policy 0, policy_version 38735 (0.0021) [2024-03-29 16:23:27,515][00476] Signal inference workers to stop experience collection... (18400 times) [2024-03-29 16:23:27,544][00497] InferenceWorker_p0-w0: stopping experience collection (18400 times) [2024-03-29 16:23:27,690][00476] Signal inference workers to resume experience collection... (18400 times) [2024-03-29 16:23:27,690][00497] InferenceWorker_p0-w0: resuming experience collection (18400 times) [2024-03-29 16:23:28,565][00497] Updated weights for policy 0, policy_version 38745 (0.0024) [2024-03-29 16:23:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 40686.9, 300 sec: 40876.7). Total num frames: 634798080. Throughput: 0: 39871.6. Samples: 516993100. Policy #0 lag: (min: 2.0, avg: 17.6, max: 40.0) [2024-03-29 16:23:28,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 16:23:33,839][00126] Fps is (10 sec: 36045.0, 60 sec: 39594.6, 300 sec: 40599.0). Total num frames: 634945536. Throughput: 0: 39548.0. Samples: 517120640. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:23:33,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 16:23:33,932][00497] Updated weights for policy 0, policy_version 38755 (0.0021) [2024-03-29 16:23:37,859][00497] Updated weights for policy 0, policy_version 38765 (0.0017) [2024-03-29 16:23:38,839][00126] Fps is (10 sec: 37683.3, 60 sec: 39594.7, 300 sec: 40765.6). Total num frames: 635174912. Throughput: 0: 40885.8. Samples: 517389720. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:23:38,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 16:23:41,203][00497] Updated weights for policy 0, policy_version 38775 (0.0027) [2024-03-29 16:23:43,839][00126] Fps is (10 sec: 44237.1, 60 sec: 40413.9, 300 sec: 40766.9). Total num frames: 635387904. Throughput: 0: 39991.7. Samples: 517594720. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:23:43,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:23:44,778][00497] Updated weights for policy 0, policy_version 38785 (0.0029) [2024-03-29 16:23:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40413.9, 300 sec: 40599.0). Total num frames: 635568128. Throughput: 0: 39694.3. Samples: 517723500. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:23:48,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 16:23:50,241][00497] Updated weights for policy 0, policy_version 38795 (0.0022) [2024-03-29 16:23:53,839][00126] Fps is (10 sec: 37682.9, 60 sec: 39867.7, 300 sec: 40599.0). Total num frames: 635764736. Throughput: 0: 41127.0. Samples: 517995400. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:23:53,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 16:23:53,987][00497] Updated weights for policy 0, policy_version 38805 (0.0033) [2024-03-29 16:23:57,196][00497] Updated weights for policy 0, policy_version 38815 (0.0021) [2024-03-29 16:23:58,839][00126] Fps is (10 sec: 45875.0, 60 sec: 40960.0, 300 sec: 40765.6). Total num frames: 636026880. Throughput: 0: 40704.4. Samples: 518214380. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 16:23:58,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 16:24:00,648][00497] Updated weights for policy 0, policy_version 38825 (0.0020) [2024-03-29 16:24:03,839][00126] Fps is (10 sec: 42597.8, 60 sec: 40959.9, 300 sec: 40654.5). Total num frames: 636190720. Throughput: 0: 40431.4. Samples: 518335420. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 16:24:03,841][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:24:06,485][00497] Updated weights for policy 0, policy_version 38835 (0.0018) [2024-03-29 16:24:06,689][00476] Signal inference workers to stop experience collection... (18450 times) [2024-03-29 16:24:06,757][00497] InferenceWorker_p0-w0: stopping experience collection (18450 times) [2024-03-29 16:24:06,762][00476] Signal inference workers to resume experience collection... (18450 times) [2024-03-29 16:24:06,782][00497] InferenceWorker_p0-w0: resuming experience collection (18450 times) [2024-03-29 16:24:08,839][00126] Fps is (10 sec: 32768.1, 60 sec: 39867.7, 300 sec: 40487.9). Total num frames: 636354560. Throughput: 0: 40778.3. Samples: 518605660. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 16:24:08,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:24:10,064][00497] Updated weights for policy 0, policy_version 38845 (0.0039) [2024-03-29 16:24:13,407][00497] Updated weights for policy 0, policy_version 38855 (0.0036) [2024-03-29 16:24:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 40413.8, 300 sec: 40711.4). Total num frames: 636616704. Throughput: 0: 40727.5. Samples: 518825840. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 16:24:13,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 16:24:16,979][00497] Updated weights for policy 0, policy_version 38865 (0.0021) [2024-03-29 16:24:18,839][00126] Fps is (10 sec: 45875.1, 60 sec: 40960.0, 300 sec: 40654.5). Total num frames: 636813312. Throughput: 0: 40476.4. Samples: 518942080. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 16:24:18,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:24:22,665][00497] Updated weights for policy 0, policy_version 38875 (0.0021) [2024-03-29 16:24:23,839][00126] Fps is (10 sec: 36044.7, 60 sec: 39867.7, 300 sec: 40487.9). Total num frames: 636977152. Throughput: 0: 40219.5. Samples: 519199600. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 16:24:23,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 16:24:26,380][00497] Updated weights for policy 0, policy_version 38885 (0.0018) [2024-03-29 16:24:28,839][00126] Fps is (10 sec: 39321.9, 60 sec: 40140.8, 300 sec: 40599.0). Total num frames: 637206528. Throughput: 0: 41004.4. Samples: 519439920. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 16:24:28,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 16:24:29,626][00497] Updated weights for policy 0, policy_version 38895 (0.0028) [2024-03-29 16:24:33,082][00497] Updated weights for policy 0, policy_version 38905 (0.0028) [2024-03-29 16:24:33,839][00126] Fps is (10 sec: 47513.5, 60 sec: 41779.1, 300 sec: 40710.1). Total num frames: 637452288. Throughput: 0: 40608.3. Samples: 519550880. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 16:24:33,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:24:38,352][00497] Updated weights for policy 0, policy_version 38915 (0.0017) [2024-03-29 16:24:38,839][00126] Fps is (10 sec: 37682.9, 60 sec: 40140.8, 300 sec: 40376.8). Total num frames: 637583360. Throughput: 0: 39865.3. Samples: 519789340. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 16:24:38,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 16:24:42,090][00476] Signal inference workers to stop experience collection... (18500 times) [2024-03-29 16:24:42,150][00497] InferenceWorker_p0-w0: stopping experience collection (18500 times) [2024-03-29 16:24:42,162][00476] Signal inference workers to resume experience collection... (18500 times) [2024-03-29 16:24:42,179][00497] InferenceWorker_p0-w0: resuming experience collection (18500 times) [2024-03-29 16:24:42,709][00497] Updated weights for policy 0, policy_version 38925 (0.0019) [2024-03-29 16:24:43,839][00126] Fps is (10 sec: 32768.0, 60 sec: 39867.6, 300 sec: 40487.9). Total num frames: 637779968. Throughput: 0: 40671.0. Samples: 520044580. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 16:24:43,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 16:24:46,164][00497] Updated weights for policy 0, policy_version 38935 (0.0024) [2024-03-29 16:24:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 40960.0, 300 sec: 40543.5). Total num frames: 638025728. Throughput: 0: 40208.1. Samples: 520144780. Policy #0 lag: (min: 1.0, avg: 22.1, max: 44.0) [2024-03-29 16:24:48,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 16:24:49,764][00497] Updated weights for policy 0, policy_version 38945 (0.0035) [2024-03-29 16:24:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 40686.9, 300 sec: 40376.8). Total num frames: 638205952. Throughput: 0: 39722.6. Samples: 520393180. Policy #0 lag: (min: 1.0, avg: 22.1, max: 44.0) [2024-03-29 16:24:53,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 16:24:54,990][00497] Updated weights for policy 0, policy_version 38955 (0.0017) [2024-03-29 16:24:58,839][00126] Fps is (10 sec: 36044.7, 60 sec: 39321.6, 300 sec: 40376.8). Total num frames: 638386176. Throughput: 0: 40553.8. Samples: 520650760. Policy #0 lag: (min: 1.0, avg: 22.1, max: 44.0) [2024-03-29 16:24:58,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:24:59,058][00497] Updated weights for policy 0, policy_version 38965 (0.0023) [2024-03-29 16:25:02,109][00497] Updated weights for policy 0, policy_version 38975 (0.0025) [2024-03-29 16:25:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 40960.1, 300 sec: 40543.5). Total num frames: 638648320. Throughput: 0: 40637.8. Samples: 520770780. Policy #0 lag: (min: 1.0, avg: 22.1, max: 44.0) [2024-03-29 16:25:03,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:25:03,901][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038981_638664704.pth... [2024-03-29 16:25:04,208][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038388_628948992.pth [2024-03-29 16:25:05,794][00497] Updated weights for policy 0, policy_version 38985 (0.0025) [2024-03-29 16:25:08,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41233.0, 300 sec: 40432.4). Total num frames: 638828544. Throughput: 0: 40233.8. Samples: 521010120. Policy #0 lag: (min: 1.0, avg: 22.1, max: 44.0) [2024-03-29 16:25:08,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 16:25:11,014][00497] Updated weights for policy 0, policy_version 38995 (0.0018) [2024-03-29 16:25:13,839][00126] Fps is (10 sec: 36044.6, 60 sec: 39867.7, 300 sec: 40321.3). Total num frames: 639008768. Throughput: 0: 40745.2. Samples: 521273460. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 16:25:13,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:25:14,827][00497] Updated weights for policy 0, policy_version 39005 (0.0027) [2024-03-29 16:25:16,574][00476] Signal inference workers to stop experience collection... (18550 times) [2024-03-29 16:25:16,596][00497] InferenceWorker_p0-w0: stopping experience collection (18550 times) [2024-03-29 16:25:16,792][00476] Signal inference workers to resume experience collection... (18550 times) [2024-03-29 16:25:16,793][00497] InferenceWorker_p0-w0: resuming experience collection (18550 times) [2024-03-29 16:25:18,144][00497] Updated weights for policy 0, policy_version 39015 (0.0027) [2024-03-29 16:25:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 40686.9, 300 sec: 40487.9). Total num frames: 639254528. Throughput: 0: 40947.2. Samples: 521393500. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 16:25:18,842][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:25:21,764][00497] Updated weights for policy 0, policy_version 39025 (0.0021) [2024-03-29 16:25:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41233.1, 300 sec: 40432.4). Total num frames: 639451136. Throughput: 0: 40745.3. Samples: 521622880. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 16:25:23,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 16:25:26,866][00497] Updated weights for policy 0, policy_version 39035 (0.0029) [2024-03-29 16:25:28,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40140.7, 300 sec: 40321.3). Total num frames: 639614976. Throughput: 0: 41228.9. Samples: 521899880. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 16:25:28,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 16:25:30,595][00497] Updated weights for policy 0, policy_version 39045 (0.0023) [2024-03-29 16:25:33,788][00497] Updated weights for policy 0, policy_version 39055 (0.0030) [2024-03-29 16:25:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 40413.9, 300 sec: 40599.0). Total num frames: 639877120. Throughput: 0: 41627.1. Samples: 522018000. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 16:25:33,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:25:37,208][00497] Updated weights for policy 0, policy_version 39065 (0.0028) [2024-03-29 16:25:38,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 40543.5). Total num frames: 640073728. Throughput: 0: 41389.3. Samples: 522255700. Policy #0 lag: (min: 1.0, avg: 23.2, max: 41.0) [2024-03-29 16:25:38,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 16:25:42,332][00497] Updated weights for policy 0, policy_version 39075 (0.0028) [2024-03-29 16:25:43,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 40432.4). Total num frames: 640253952. Throughput: 0: 41731.1. Samples: 522528660. Policy #0 lag: (min: 1.0, avg: 23.2, max: 41.0) [2024-03-29 16:25:43,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 16:25:46,266][00497] Updated weights for policy 0, policy_version 39085 (0.0027) [2024-03-29 16:25:47,945][00476] Signal inference workers to stop experience collection... (18600 times) [2024-03-29 16:25:48,022][00497] InferenceWorker_p0-w0: stopping experience collection (18600 times) [2024-03-29 16:25:48,110][00476] Signal inference workers to resume experience collection... (18600 times) [2024-03-29 16:25:48,110][00497] InferenceWorker_p0-w0: resuming experience collection (18600 times) [2024-03-29 16:25:48,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41233.2, 300 sec: 40710.1). Total num frames: 640499712. Throughput: 0: 41785.9. Samples: 522651140. Policy #0 lag: (min: 1.0, avg: 23.2, max: 41.0) [2024-03-29 16:25:48,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 16:25:49,306][00497] Updated weights for policy 0, policy_version 39095 (0.0030) [2024-03-29 16:25:52,870][00497] Updated weights for policy 0, policy_version 39105 (0.0027) [2024-03-29 16:25:53,839][00126] Fps is (10 sec: 47513.8, 60 sec: 42052.3, 300 sec: 40710.1). Total num frames: 640729088. Throughput: 0: 41589.9. Samples: 522881660. Policy #0 lag: (min: 1.0, avg: 23.2, max: 41.0) [2024-03-29 16:25:53,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 16:25:57,890][00497] Updated weights for policy 0, policy_version 39115 (0.0035) [2024-03-29 16:25:58,839][00126] Fps is (10 sec: 37682.3, 60 sec: 41506.1, 300 sec: 40432.4). Total num frames: 640876544. Throughput: 0: 41705.8. Samples: 523150220. Policy #0 lag: (min: 1.0, avg: 23.2, max: 41.0) [2024-03-29 16:25:58,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 16:26:01,945][00497] Updated weights for policy 0, policy_version 39125 (0.0027) [2024-03-29 16:26:03,839][00126] Fps is (10 sec: 37683.1, 60 sec: 40960.0, 300 sec: 40654.5). Total num frames: 641105920. Throughput: 0: 41913.4. Samples: 523279600. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 16:26:03,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 16:26:05,069][00497] Updated weights for policy 0, policy_version 39135 (0.0023) [2024-03-29 16:26:08,680][00497] Updated weights for policy 0, policy_version 39145 (0.0027) [2024-03-29 16:26:08,839][00126] Fps is (10 sec: 47514.4, 60 sec: 42052.3, 300 sec: 40765.6). Total num frames: 641351680. Throughput: 0: 41595.2. Samples: 523494660. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 16:26:08,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:26:13,802][00497] Updated weights for policy 0, policy_version 39155 (0.0028) [2024-03-29 16:26:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 40543.5). Total num frames: 641515520. Throughput: 0: 41533.4. Samples: 523768880. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 16:26:13,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:26:17,674][00497] Updated weights for policy 0, policy_version 39165 (0.0019) [2024-03-29 16:26:18,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.1, 300 sec: 40710.1). Total num frames: 641728512. Throughput: 0: 41946.2. Samples: 523905580. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 16:26:18,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 16:26:19,505][00476] Signal inference workers to stop experience collection... (18650 times) [2024-03-29 16:26:19,542][00497] InferenceWorker_p0-w0: stopping experience collection (18650 times) [2024-03-29 16:26:19,736][00476] Signal inference workers to resume experience collection... (18650 times) [2024-03-29 16:26:19,736][00497] InferenceWorker_p0-w0: resuming experience collection (18650 times) [2024-03-29 16:26:20,843][00497] Updated weights for policy 0, policy_version 39175 (0.0030) [2024-03-29 16:26:23,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.2, 300 sec: 40821.1). Total num frames: 641974272. Throughput: 0: 41315.5. Samples: 524114900. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 16:26:23,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 16:26:24,747][00497] Updated weights for policy 0, policy_version 39185 (0.0026) [2024-03-29 16:26:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 40599.0). Total num frames: 642138112. Throughput: 0: 40971.1. Samples: 524372360. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 16:26:28,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 16:26:29,748][00497] Updated weights for policy 0, policy_version 39195 (0.0025) [2024-03-29 16:26:33,839][00126] Fps is (10 sec: 34406.4, 60 sec: 40686.9, 300 sec: 40599.0). Total num frames: 642318336. Throughput: 0: 41447.8. Samples: 524516300. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 16:26:33,842][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 16:26:33,920][00497] Updated weights for policy 0, policy_version 39205 (0.0023) [2024-03-29 16:26:37,012][00497] Updated weights for policy 0, policy_version 39215 (0.0025) [2024-03-29 16:26:38,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.3, 300 sec: 40876.7). Total num frames: 642580480. Throughput: 0: 41189.4. Samples: 524735180. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 16:26:38,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 16:26:40,523][00497] Updated weights for policy 0, policy_version 39225 (0.0020) [2024-03-29 16:26:43,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 40765.6). Total num frames: 642777088. Throughput: 0: 40876.5. Samples: 524989660. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 16:26:43,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 16:26:45,628][00497] Updated weights for policy 0, policy_version 39235 (0.0026) [2024-03-29 16:26:48,839][00126] Fps is (10 sec: 36044.6, 60 sec: 40686.8, 300 sec: 40599.0). Total num frames: 642940928. Throughput: 0: 41075.5. Samples: 525128000. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 16:26:48,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 16:26:49,699][00497] Updated weights for policy 0, policy_version 39245 (0.0033) [2024-03-29 16:26:51,176][00476] Signal inference workers to stop experience collection... (18700 times) [2024-03-29 16:26:51,209][00497] InferenceWorker_p0-w0: stopping experience collection (18700 times) [2024-03-29 16:26:51,379][00476] Signal inference workers to resume experience collection... (18700 times) [2024-03-29 16:26:51,379][00497] InferenceWorker_p0-w0: resuming experience collection (18700 times) [2024-03-29 16:26:52,841][00497] Updated weights for policy 0, policy_version 39255 (0.0025) [2024-03-29 16:26:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40960.0, 300 sec: 40932.2). Total num frames: 643186688. Throughput: 0: 41623.1. Samples: 525367700. Policy #0 lag: (min: 2.0, avg: 20.3, max: 43.0) [2024-03-29 16:26:53,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:26:56,483][00497] Updated weights for policy 0, policy_version 39265 (0.0031) [2024-03-29 16:26:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 40710.1). Total num frames: 643383296. Throughput: 0: 40871.9. Samples: 525608120. Policy #0 lag: (min: 2.0, avg: 20.3, max: 43.0) [2024-03-29 16:26:58,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 16:27:01,489][00497] Updated weights for policy 0, policy_version 39275 (0.0028) [2024-03-29 16:27:03,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40960.0, 300 sec: 40599.0). Total num frames: 643563520. Throughput: 0: 40902.7. Samples: 525746200. Policy #0 lag: (min: 2.0, avg: 20.3, max: 43.0) [2024-03-29 16:27:03,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 16:27:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000039280_643563520.pth... [2024-03-29 16:27:04,163][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038681_633749504.pth [2024-03-29 16:27:05,610][00497] Updated weights for policy 0, policy_version 39285 (0.0026) [2024-03-29 16:27:08,720][00497] Updated weights for policy 0, policy_version 39295 (0.0024) [2024-03-29 16:27:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 40959.9, 300 sec: 40932.2). Total num frames: 643809280. Throughput: 0: 41884.9. Samples: 525999720. Policy #0 lag: (min: 2.0, avg: 20.3, max: 43.0) [2024-03-29 16:27:08,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:27:12,308][00497] Updated weights for policy 0, policy_version 39305 (0.0024) [2024-03-29 16:27:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 40765.6). Total num frames: 644005888. Throughput: 0: 41124.1. Samples: 526222940. Policy #0 lag: (min: 2.0, avg: 20.3, max: 43.0) [2024-03-29 16:27:13,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:27:17,058][00497] Updated weights for policy 0, policy_version 39315 (0.0025) [2024-03-29 16:27:18,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 40599.0). Total num frames: 644186112. Throughput: 0: 41047.1. Samples: 526363420. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:27:18,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 16:27:21,159][00497] Updated weights for policy 0, policy_version 39325 (0.0022) [2024-03-29 16:27:22,749][00476] Signal inference workers to stop experience collection... (18750 times) [2024-03-29 16:27:22,771][00497] InferenceWorker_p0-w0: stopping experience collection (18750 times) [2024-03-29 16:27:22,966][00476] Signal inference workers to resume experience collection... (18750 times) [2024-03-29 16:27:22,966][00497] InferenceWorker_p0-w0: resuming experience collection (18750 times) [2024-03-29 16:27:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41233.2, 300 sec: 40987.8). Total num frames: 644448256. Throughput: 0: 42225.8. Samples: 526635340. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:27:23,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 16:27:24,114][00497] Updated weights for policy 0, policy_version 39335 (0.0020) [2024-03-29 16:27:27,804][00497] Updated weights for policy 0, policy_version 39345 (0.0029) [2024-03-29 16:27:28,839][00126] Fps is (10 sec: 47513.9, 60 sec: 42052.3, 300 sec: 40987.8). Total num frames: 644661248. Throughput: 0: 41383.1. Samples: 526851900. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:27:28,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:27:32,616][00497] Updated weights for policy 0, policy_version 39355 (0.0019) [2024-03-29 16:27:33,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41779.3, 300 sec: 40765.6). Total num frames: 644825088. Throughput: 0: 41445.8. Samples: 526993060. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:27:33,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 16:27:36,926][00497] Updated weights for policy 0, policy_version 39365 (0.0022) [2024-03-29 16:27:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 40987.8). Total num frames: 645054464. Throughput: 0: 41953.7. Samples: 527255620. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 16:27:38,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 16:27:39,959][00497] Updated weights for policy 0, policy_version 39375 (0.0035) [2024-03-29 16:27:43,327][00497] Updated weights for policy 0, policy_version 39385 (0.0028) [2024-03-29 16:27:43,839][00126] Fps is (10 sec: 45874.4, 60 sec: 41779.1, 300 sec: 41154.4). Total num frames: 645283840. Throughput: 0: 41572.8. Samples: 527478900. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 16:27:43,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 16:27:48,182][00497] Updated weights for policy 0, policy_version 39395 (0.0024) [2024-03-29 16:27:48,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 40932.2). Total num frames: 645447680. Throughput: 0: 41633.3. Samples: 527619700. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 16:27:48,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 16:27:52,559][00497] Updated weights for policy 0, policy_version 39405 (0.0023) [2024-03-29 16:27:53,704][00476] Signal inference workers to stop experience collection... (18800 times) [2024-03-29 16:27:53,704][00476] Signal inference workers to resume experience collection... (18800 times) [2024-03-29 16:27:53,727][00497] InferenceWorker_p0-w0: stopping experience collection (18800 times) [2024-03-29 16:27:53,728][00497] InferenceWorker_p0-w0: resuming experience collection (18800 times) [2024-03-29 16:27:53,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41506.1, 300 sec: 41043.3). Total num frames: 645677056. Throughput: 0: 42088.1. Samples: 527893680. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 16:27:53,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:27:55,707][00497] Updated weights for policy 0, policy_version 39415 (0.0021) [2024-03-29 16:27:58,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42325.4, 300 sec: 41321.0). Total num frames: 645922816. Throughput: 0: 41797.3. Samples: 528103820. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 16:27:58,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:27:59,021][00497] Updated weights for policy 0, policy_version 39425 (0.0022) [2024-03-29 16:28:03,786][00497] Updated weights for policy 0, policy_version 39435 (0.0030) [2024-03-29 16:28:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 41154.4). Total num frames: 646103040. Throughput: 0: 41879.2. Samples: 528247980. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 16:28:03,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 16:28:08,186][00497] Updated weights for policy 0, policy_version 39445 (0.0024) [2024-03-29 16:28:08,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.2, 300 sec: 41043.3). Total num frames: 646299648. Throughput: 0: 41947.9. Samples: 528523000. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 16:28:08,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 16:28:11,289][00497] Updated weights for policy 0, policy_version 39455 (0.0028) [2024-03-29 16:28:13,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 41376.5). Total num frames: 646561792. Throughput: 0: 41871.9. Samples: 528736140. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 16:28:13,842][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:28:14,815][00497] Updated weights for policy 0, policy_version 39465 (0.0019) [2024-03-29 16:28:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41154.4). Total num frames: 646725632. Throughput: 0: 41689.8. Samples: 528869100. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 16:28:18,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 16:28:19,413][00497] Updated weights for policy 0, policy_version 39475 (0.0017) [2024-03-29 16:28:23,752][00497] Updated weights for policy 0, policy_version 39485 (0.0019) [2024-03-29 16:28:23,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41233.0, 300 sec: 41098.9). Total num frames: 646922240. Throughput: 0: 42172.0. Samples: 529153360. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 16:28:23,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 16:28:24,435][00476] Signal inference workers to stop experience collection... (18850 times) [2024-03-29 16:28:24,454][00497] InferenceWorker_p0-w0: stopping experience collection (18850 times) [2024-03-29 16:28:24,646][00476] Signal inference workers to resume experience collection... (18850 times) [2024-03-29 16:28:24,646][00497] InferenceWorker_p0-w0: resuming experience collection (18850 times) [2024-03-29 16:28:26,920][00497] Updated weights for policy 0, policy_version 39495 (0.0032) [2024-03-29 16:28:28,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 41487.6). Total num frames: 647184384. Throughput: 0: 42165.2. Samples: 529376320. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 16:28:28,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 16:28:30,294][00497] Updated weights for policy 0, policy_version 39505 (0.0035) [2024-03-29 16:28:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 41321.0). Total num frames: 647364608. Throughput: 0: 41752.4. Samples: 529498560. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:28:33,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 16:28:34,943][00497] Updated weights for policy 0, policy_version 39515 (0.0022) [2024-03-29 16:28:38,839][00126] Fps is (10 sec: 36044.2, 60 sec: 41506.1, 300 sec: 41209.9). Total num frames: 647544832. Throughput: 0: 41988.9. Samples: 529783180. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:28:38,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 16:28:39,471][00497] Updated weights for policy 0, policy_version 39525 (0.0030) [2024-03-29 16:28:42,674][00497] Updated weights for policy 0, policy_version 39535 (0.0028) [2024-03-29 16:28:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41432.1). Total num frames: 647790592. Throughput: 0: 42254.2. Samples: 530005260. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:28:43,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 16:28:45,845][00497] Updated weights for policy 0, policy_version 39545 (0.0024) [2024-03-29 16:28:48,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41432.1). Total num frames: 647987200. Throughput: 0: 41667.6. Samples: 530123020. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:28:48,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 16:28:50,497][00497] Updated weights for policy 0, policy_version 39555 (0.0026) [2024-03-29 16:28:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41209.9). Total num frames: 648183808. Throughput: 0: 41773.3. Samples: 530402800. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:28:53,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 16:28:54,991][00497] Updated weights for policy 0, policy_version 39565 (0.0017) [2024-03-29 16:28:56,102][00476] Signal inference workers to stop experience collection... (18900 times) [2024-03-29 16:28:56,102][00476] Signal inference workers to resume experience collection... (18900 times) [2024-03-29 16:28:56,140][00497] InferenceWorker_p0-w0: stopping experience collection (18900 times) [2024-03-29 16:28:56,141][00497] InferenceWorker_p0-w0: resuming experience collection (18900 times) [2024-03-29 16:28:58,074][00497] Updated weights for policy 0, policy_version 39575 (0.0025) [2024-03-29 16:28:58,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 648429568. Throughput: 0: 42488.5. Samples: 530648120. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:28:58,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 16:29:01,412][00497] Updated weights for policy 0, policy_version 39585 (0.0027) [2024-03-29 16:29:03,839][00126] Fps is (10 sec: 45874.3, 60 sec: 42325.2, 300 sec: 41654.2). Total num frames: 648642560. Throughput: 0: 42193.2. Samples: 530767800. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:29:03,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:29:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000039590_648642560.pth... [2024-03-29 16:29:04,163][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000038981_638664704.pth [2024-03-29 16:29:06,068][00497] Updated weights for policy 0, policy_version 39595 (0.0019) [2024-03-29 16:29:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41376.6). Total num frames: 648822784. Throughput: 0: 41970.2. Samples: 531042020. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:29:08,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 16:29:10,256][00497] Updated weights for policy 0, policy_version 39605 (0.0022) [2024-03-29 16:29:13,538][00497] Updated weights for policy 0, policy_version 39615 (0.0029) [2024-03-29 16:29:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 649068544. Throughput: 0: 42230.1. Samples: 531276680. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:29:13,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 16:29:17,068][00497] Updated weights for policy 0, policy_version 39625 (0.0026) [2024-03-29 16:29:18,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 41709.8). Total num frames: 649281536. Throughput: 0: 42059.2. Samples: 531391220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:29:18,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:29:21,683][00497] Updated weights for policy 0, policy_version 39635 (0.0021) [2024-03-29 16:29:23,839][00126] Fps is (10 sec: 37683.4, 60 sec: 42052.3, 300 sec: 41487.6). Total num frames: 649445376. Throughput: 0: 41972.9. Samples: 531671960. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:29:23,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 16:29:25,968][00497] Updated weights for policy 0, policy_version 39645 (0.0027) [2024-03-29 16:29:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41487.6). Total num frames: 649691136. Throughput: 0: 42532.4. Samples: 531919220. Policy #0 lag: (min: 1.0, avg: 18.6, max: 41.0) [2024-03-29 16:29:28,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:29:28,934][00497] Updated weights for policy 0, policy_version 39655 (0.0031) [2024-03-29 16:29:28,970][00476] Signal inference workers to stop experience collection... (18950 times) [2024-03-29 16:29:29,004][00497] InferenceWorker_p0-w0: stopping experience collection (18950 times) [2024-03-29 16:29:29,184][00476] Signal inference workers to resume experience collection... (18950 times) [2024-03-29 16:29:29,184][00497] InferenceWorker_p0-w0: resuming experience collection (18950 times) [2024-03-29 16:29:32,376][00497] Updated weights for policy 0, policy_version 39665 (0.0018) [2024-03-29 16:29:33,840][00126] Fps is (10 sec: 47512.6, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 649920512. Throughput: 0: 42513.0. Samples: 532036120. Policy #0 lag: (min: 1.0, avg: 18.6, max: 41.0) [2024-03-29 16:29:33,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 16:29:36,784][00497] Updated weights for policy 0, policy_version 39675 (0.0022) [2024-03-29 16:29:38,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 650084352. Throughput: 0: 42409.9. Samples: 532311240. Policy #0 lag: (min: 1.0, avg: 18.6, max: 41.0) [2024-03-29 16:29:38,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 16:29:41,203][00497] Updated weights for policy 0, policy_version 39685 (0.0027) [2024-03-29 16:29:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.2, 300 sec: 41709.8). Total num frames: 650330112. Throughput: 0: 42644.7. Samples: 532567140. Policy #0 lag: (min: 1.0, avg: 18.6, max: 41.0) [2024-03-29 16:29:43,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:29:44,178][00497] Updated weights for policy 0, policy_version 39695 (0.0019) [2024-03-29 16:29:47,504][00497] Updated weights for policy 0, policy_version 39705 (0.0018) [2024-03-29 16:29:48,839][00126] Fps is (10 sec: 47512.5, 60 sec: 42871.3, 300 sec: 41876.4). Total num frames: 650559488. Throughput: 0: 42445.8. Samples: 532677860. Policy #0 lag: (min: 1.0, avg: 18.6, max: 41.0) [2024-03-29 16:29:48,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 16:29:51,862][00497] Updated weights for policy 0, policy_version 39715 (0.0022) [2024-03-29 16:29:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42598.3, 300 sec: 41876.4). Total num frames: 650739712. Throughput: 0: 42435.0. Samples: 532951600. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 16:29:53,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 16:29:56,493][00497] Updated weights for policy 0, policy_version 39725 (0.0018) [2024-03-29 16:29:58,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 650969088. Throughput: 0: 42935.6. Samples: 533208780. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 16:29:58,841][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 16:29:59,483][00497] Updated weights for policy 0, policy_version 39735 (0.0021) [2024-03-29 16:30:01,184][00476] Signal inference workers to stop experience collection... (19000 times) [2024-03-29 16:30:01,252][00497] InferenceWorker_p0-w0: stopping experience collection (19000 times) [2024-03-29 16:30:01,259][00476] Signal inference workers to resume experience collection... (19000 times) [2024-03-29 16:30:01,278][00497] InferenceWorker_p0-w0: resuming experience collection (19000 times) [2024-03-29 16:30:02,919][00497] Updated weights for policy 0, policy_version 39745 (0.0029) [2024-03-29 16:30:03,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42598.5, 300 sec: 41931.9). Total num frames: 651198464. Throughput: 0: 42905.8. Samples: 533321980. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 16:30:03,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 16:30:07,401][00497] Updated weights for policy 0, policy_version 39755 (0.0025) [2024-03-29 16:30:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 651378688. Throughput: 0: 42598.2. Samples: 533588880. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 16:30:08,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 16:30:11,912][00497] Updated weights for policy 0, policy_version 39765 (0.0022) [2024-03-29 16:30:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 651608064. Throughput: 0: 42909.8. Samples: 533850160. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 16:30:13,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 16:30:14,941][00497] Updated weights for policy 0, policy_version 39775 (0.0030) [2024-03-29 16:30:18,703][00497] Updated weights for policy 0, policy_version 39785 (0.0019) [2024-03-29 16:30:18,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 651837440. Throughput: 0: 42653.0. Samples: 533955500. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 16:30:18,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:30:23,233][00497] Updated weights for policy 0, policy_version 39795 (0.0025) [2024-03-29 16:30:23,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 652017664. Throughput: 0: 42311.3. Samples: 534215260. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 16:30:23,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 16:30:27,763][00497] Updated weights for policy 0, policy_version 39805 (0.0027) [2024-03-29 16:30:28,839][00126] Fps is (10 sec: 37683.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 652214272. Throughput: 0: 42590.9. Samples: 534483720. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 16:30:28,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 16:30:30,632][00497] Updated weights for policy 0, policy_version 39815 (0.0022) [2024-03-29 16:30:32,443][00476] Signal inference workers to stop experience collection... (19050 times) [2024-03-29 16:30:32,513][00497] InferenceWorker_p0-w0: stopping experience collection (19050 times) [2024-03-29 16:30:32,608][00476] Signal inference workers to resume experience collection... (19050 times) [2024-03-29 16:30:32,609][00497] InferenceWorker_p0-w0: resuming experience collection (19050 times) [2024-03-29 16:30:33,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.5, 300 sec: 41987.5). Total num frames: 652460032. Throughput: 0: 42270.8. Samples: 534580040. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 16:30:33,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 16:30:34,301][00497] Updated weights for policy 0, policy_version 39825 (0.0020) [2024-03-29 16:30:38,803][00497] Updated weights for policy 0, policy_version 39835 (0.0019) [2024-03-29 16:30:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 652656640. Throughput: 0: 42058.3. Samples: 534844220. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 16:30:38,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 16:30:43,311][00497] Updated weights for policy 0, policy_version 39845 (0.0026) [2024-03-29 16:30:43,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.4, 300 sec: 41820.8). Total num frames: 652836864. Throughput: 0: 42357.4. Samples: 535114860. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 16:30:43,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 16:30:46,389][00497] Updated weights for policy 0, policy_version 39855 (0.0027) [2024-03-29 16:30:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.5, 300 sec: 41931.9). Total num frames: 653099008. Throughput: 0: 42182.7. Samples: 535220200. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 16:30:48,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 16:30:49,956][00497] Updated weights for policy 0, policy_version 39865 (0.0020) [2024-03-29 16:30:53,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 653295616. Throughput: 0: 41719.5. Samples: 535466260. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 16:30:53,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 16:30:54,463][00497] Updated weights for policy 0, policy_version 39875 (0.0018) [2024-03-29 16:30:58,806][00497] Updated weights for policy 0, policy_version 39885 (0.0019) [2024-03-29 16:30:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 653475840. Throughput: 0: 41919.5. Samples: 535736540. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 16:30:58,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:31:01,972][00497] Updated weights for policy 0, policy_version 39895 (0.0033) [2024-03-29 16:31:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 653737984. Throughput: 0: 42012.5. Samples: 535846060. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 16:31:03,841][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 16:31:04,100][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000039902_653754368.pth... [2024-03-29 16:31:04,443][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000039280_643563520.pth [2024-03-29 16:31:05,717][00497] Updated weights for policy 0, policy_version 39905 (0.0028) [2024-03-29 16:31:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 653918208. Throughput: 0: 41890.4. Samples: 536100320. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 16:31:08,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:31:10,284][00497] Updated weights for policy 0, policy_version 39915 (0.0023) [2024-03-29 16:31:11,924][00476] Signal inference workers to stop experience collection... (19100 times) [2024-03-29 16:31:11,958][00497] InferenceWorker_p0-w0: stopping experience collection (19100 times) [2024-03-29 16:31:12,113][00476] Signal inference workers to resume experience collection... (19100 times) [2024-03-29 16:31:12,114][00497] InferenceWorker_p0-w0: resuming experience collection (19100 times) [2024-03-29 16:31:13,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 654098432. Throughput: 0: 41549.7. Samples: 536353460. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 16:31:13,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 16:31:14,606][00497] Updated weights for policy 0, policy_version 39925 (0.0021) [2024-03-29 16:31:17,853][00497] Updated weights for policy 0, policy_version 39935 (0.0022) [2024-03-29 16:31:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 654344192. Throughput: 0: 42037.8. Samples: 536471740. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 16:31:18,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 16:31:21,716][00497] Updated weights for policy 0, policy_version 39945 (0.0024) [2024-03-29 16:31:23,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 654540800. Throughput: 0: 41507.4. Samples: 536712060. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 16:31:23,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 16:31:26,110][00497] Updated weights for policy 0, policy_version 39955 (0.0019) [2024-03-29 16:31:28,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 654721024. Throughput: 0: 41420.4. Samples: 536978780. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 16:31:28,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 16:31:30,543][00497] Updated weights for policy 0, policy_version 39965 (0.0022) [2024-03-29 16:31:33,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 654934016. Throughput: 0: 41832.5. Samples: 537102660. Policy #0 lag: (min: 3.0, avg: 19.6, max: 43.0) [2024-03-29 16:31:33,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 16:31:33,852][00497] Updated weights for policy 0, policy_version 39975 (0.0023) [2024-03-29 16:31:37,675][00497] Updated weights for policy 0, policy_version 39985 (0.0027) [2024-03-29 16:31:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 655147008. Throughput: 0: 41337.4. Samples: 537326440. Policy #0 lag: (min: 3.0, avg: 19.6, max: 43.0) [2024-03-29 16:31:38,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 16:31:42,102][00497] Updated weights for policy 0, policy_version 39995 (0.0018) [2024-03-29 16:31:43,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 655327232. Throughput: 0: 41380.4. Samples: 537598660. Policy #0 lag: (min: 3.0, avg: 19.6, max: 43.0) [2024-03-29 16:31:43,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:31:45,966][00476] Signal inference workers to stop experience collection... (19150 times) [2024-03-29 16:31:46,038][00497] InferenceWorker_p0-w0: stopping experience collection (19150 times) [2024-03-29 16:31:46,043][00476] Signal inference workers to resume experience collection... (19150 times) [2024-03-29 16:31:46,061][00497] InferenceWorker_p0-w0: resuming experience collection (19150 times) [2024-03-29 16:31:46,371][00497] Updated weights for policy 0, policy_version 40005 (0.0023) [2024-03-29 16:31:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 655556608. Throughput: 0: 41752.5. Samples: 537724920. Policy #0 lag: (min: 3.0, avg: 19.6, max: 43.0) [2024-03-29 16:31:48,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 16:31:49,640][00497] Updated weights for policy 0, policy_version 40015 (0.0031) [2024-03-29 16:31:53,490][00497] Updated weights for policy 0, policy_version 40025 (0.0025) [2024-03-29 16:31:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 655769600. Throughput: 0: 41036.0. Samples: 537946940. Policy #0 lag: (min: 3.0, avg: 19.6, max: 43.0) [2024-03-29 16:31:53,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 16:31:57,653][00497] Updated weights for policy 0, policy_version 40035 (0.0026) [2024-03-29 16:31:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 655966208. Throughput: 0: 41520.9. Samples: 538221900. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 16:31:58,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:32:02,012][00497] Updated weights for policy 0, policy_version 40045 (0.0027) [2024-03-29 16:32:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 40960.0, 300 sec: 41987.5). Total num frames: 656195584. Throughput: 0: 41717.7. Samples: 538349040. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 16:32:03,841][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 16:32:05,244][00497] Updated weights for policy 0, policy_version 40055 (0.0025) [2024-03-29 16:32:08,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.0, 300 sec: 42043.0). Total num frames: 656408576. Throughput: 0: 41436.0. Samples: 538576680. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 16:32:08,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 16:32:08,898][00497] Updated weights for policy 0, policy_version 40065 (0.0021) [2024-03-29 16:32:13,402][00497] Updated weights for policy 0, policy_version 40075 (0.0023) [2024-03-29 16:32:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 656605184. Throughput: 0: 41543.6. Samples: 538848240. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 16:32:13,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 16:32:17,379][00476] Signal inference workers to stop experience collection... (19200 times) [2024-03-29 16:32:17,381][00476] Signal inference workers to resume experience collection... (19200 times) [2024-03-29 16:32:17,421][00497] InferenceWorker_p0-w0: stopping experience collection (19200 times) [2024-03-29 16:32:17,421][00497] InferenceWorker_p0-w0: resuming experience collection (19200 times) [2024-03-29 16:32:17,645][00497] Updated weights for policy 0, policy_version 40085 (0.0027) [2024-03-29 16:32:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 656818176. Throughput: 0: 41522.6. Samples: 538971180. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 16:32:18,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 16:32:20,882][00497] Updated weights for policy 0, policy_version 40095 (0.0037) [2024-03-29 16:32:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 657031168. Throughput: 0: 41701.8. Samples: 539203020. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 16:32:23,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 16:32:24,709][00497] Updated weights for policy 0, policy_version 40105 (0.0018) [2024-03-29 16:32:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 657227776. Throughput: 0: 41471.6. Samples: 539464880. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 16:32:28,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 16:32:29,129][00497] Updated weights for policy 0, policy_version 40115 (0.0018) [2024-03-29 16:32:33,292][00497] Updated weights for policy 0, policy_version 40125 (0.0021) [2024-03-29 16:32:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 657440768. Throughput: 0: 41677.7. Samples: 539600420. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 16:32:33,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:32:36,439][00497] Updated weights for policy 0, policy_version 40135 (0.0024) [2024-03-29 16:32:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 657686528. Throughput: 0: 41977.7. Samples: 539835940. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 16:32:38,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 16:32:40,459][00497] Updated weights for policy 0, policy_version 40145 (0.0022) [2024-03-29 16:32:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 657866752. Throughput: 0: 41626.1. Samples: 540095080. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 16:32:43,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 16:32:44,738][00497] Updated weights for policy 0, policy_version 40155 (0.0024) [2024-03-29 16:32:48,670][00476] Signal inference workers to stop experience collection... (19250 times) [2024-03-29 16:32:48,674][00476] Signal inference workers to resume experience collection... (19250 times) [2024-03-29 16:32:48,694][00497] Updated weights for policy 0, policy_version 40165 (0.0030) [2024-03-29 16:32:48,719][00497] InferenceWorker_p0-w0: stopping experience collection (19250 times) [2024-03-29 16:32:48,720][00497] InferenceWorker_p0-w0: resuming experience collection (19250 times) [2024-03-29 16:32:48,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 658063360. Throughput: 0: 41964.0. Samples: 540237420. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 16:32:48,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 16:32:51,965][00497] Updated weights for policy 0, policy_version 40175 (0.0027) [2024-03-29 16:32:53,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 658325504. Throughput: 0: 42175.2. Samples: 540474560. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 16:32:53,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 16:32:56,000][00497] Updated weights for policy 0, policy_version 40185 (0.0019) [2024-03-29 16:32:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 658505728. Throughput: 0: 41898.6. Samples: 540733680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 16:32:58,841][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:33:00,367][00497] Updated weights for policy 0, policy_version 40195 (0.0037) [2024-03-29 16:33:03,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 658702336. Throughput: 0: 42108.1. Samples: 540866040. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 16:33:03,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 16:33:04,140][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000040205_658718720.pth... [2024-03-29 16:33:04,158][00497] Updated weights for policy 0, policy_version 40205 (0.0023) [2024-03-29 16:33:04,436][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000039590_648642560.pth [2024-03-29 16:33:07,563][00497] Updated weights for policy 0, policy_version 40215 (0.0022) [2024-03-29 16:33:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 658931712. Throughput: 0: 42222.6. Samples: 541103040. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 16:33:08,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 16:33:11,608][00497] Updated weights for policy 0, policy_version 40225 (0.0020) [2024-03-29 16:33:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 659128320. Throughput: 0: 42236.9. Samples: 541365540. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 16:33:13,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:33:16,036][00497] Updated weights for policy 0, policy_version 40235 (0.0034) [2024-03-29 16:33:18,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 659324928. Throughput: 0: 41996.8. Samples: 541490280. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:33:18,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 16:33:19,847][00497] Updated weights for policy 0, policy_version 40245 (0.0022) [2024-03-29 16:33:23,240][00476] Signal inference workers to stop experience collection... (19300 times) [2024-03-29 16:33:23,243][00476] Signal inference workers to resume experience collection... (19300 times) [2024-03-29 16:33:23,267][00497] Updated weights for policy 0, policy_version 40255 (0.0020) [2024-03-29 16:33:23,287][00497] InferenceWorker_p0-w0: stopping experience collection (19300 times) [2024-03-29 16:33:23,288][00497] InferenceWorker_p0-w0: resuming experience collection (19300 times) [2024-03-29 16:33:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 659554304. Throughput: 0: 42043.1. Samples: 541727880. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:33:23,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 16:33:27,126][00497] Updated weights for policy 0, policy_version 40265 (0.0029) [2024-03-29 16:33:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 659750912. Throughput: 0: 42091.5. Samples: 541989200. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:33:28,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 16:33:31,461][00497] Updated weights for policy 0, policy_version 40275 (0.0021) [2024-03-29 16:33:33,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 659963904. Throughput: 0: 41804.9. Samples: 542118640. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:33:33,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 16:33:35,518][00497] Updated weights for policy 0, policy_version 40285 (0.0017) [2024-03-29 16:33:38,746][00497] Updated weights for policy 0, policy_version 40295 (0.0026) [2024-03-29 16:33:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 660193280. Throughput: 0: 41873.7. Samples: 542358880. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 16:33:38,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 16:33:42,894][00497] Updated weights for policy 0, policy_version 40305 (0.0018) [2024-03-29 16:33:43,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41987.4). Total num frames: 660373504. Throughput: 0: 41831.9. Samples: 542616120. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 16:33:43,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 16:33:47,340][00497] Updated weights for policy 0, policy_version 40315 (0.0026) [2024-03-29 16:33:48,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 660586496. Throughput: 0: 41703.0. Samples: 542742680. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 16:33:48,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 16:33:51,212][00497] Updated weights for policy 0, policy_version 40325 (0.0027) [2024-03-29 16:33:53,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 660799488. Throughput: 0: 42194.7. Samples: 543001800. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 16:33:53,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 16:33:54,458][00497] Updated weights for policy 0, policy_version 40335 (0.0028) [2024-03-29 16:33:56,284][00476] Signal inference workers to stop experience collection... (19350 times) [2024-03-29 16:33:56,327][00497] InferenceWorker_p0-w0: stopping experience collection (19350 times) [2024-03-29 16:33:56,503][00476] Signal inference workers to resume experience collection... (19350 times) [2024-03-29 16:33:56,504][00497] InferenceWorker_p0-w0: resuming experience collection (19350 times) [2024-03-29 16:33:58,542][00497] Updated weights for policy 0, policy_version 40345 (0.0024) [2024-03-29 16:33:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 661012480. Throughput: 0: 41595.9. Samples: 543237360. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 16:33:58,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 16:34:02,988][00497] Updated weights for policy 0, policy_version 40355 (0.0021) [2024-03-29 16:34:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 661209088. Throughput: 0: 41831.2. Samples: 543372680. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 16:34:03,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 16:34:06,737][00497] Updated weights for policy 0, policy_version 40365 (0.0023) [2024-03-29 16:34:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 661438464. Throughput: 0: 42255.2. Samples: 543629360. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:34:08,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 16:34:10,092][00497] Updated weights for policy 0, policy_version 40375 (0.0029) [2024-03-29 16:34:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 661635072. Throughput: 0: 41488.6. Samples: 543856180. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:34:13,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 16:34:14,218][00497] Updated weights for policy 0, policy_version 40385 (0.0028) [2024-03-29 16:34:18,494][00497] Updated weights for policy 0, policy_version 40395 (0.0018) [2024-03-29 16:34:18,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 661831680. Throughput: 0: 41798.2. Samples: 543999560. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:34:18,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:34:22,307][00497] Updated weights for policy 0, policy_version 40405 (0.0024) [2024-03-29 16:34:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 662061056. Throughput: 0: 42358.8. Samples: 544265020. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:34:23,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:34:25,672][00497] Updated weights for policy 0, policy_version 40415 (0.0025) [2024-03-29 16:34:28,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 662290432. Throughput: 0: 41855.1. Samples: 544499600. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:34:28,842][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:34:29,746][00497] Updated weights for policy 0, policy_version 40425 (0.0020) [2024-03-29 16:34:32,267][00476] Signal inference workers to stop experience collection... (19400 times) [2024-03-29 16:34:32,270][00476] Signal inference workers to resume experience collection... (19400 times) [2024-03-29 16:34:32,309][00497] InferenceWorker_p0-w0: stopping experience collection (19400 times) [2024-03-29 16:34:32,310][00497] InferenceWorker_p0-w0: resuming experience collection (19400 times) [2024-03-29 16:34:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 662470656. Throughput: 0: 42091.1. Samples: 544636780. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:34:33,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:34:34,263][00497] Updated weights for policy 0, policy_version 40435 (0.0020) [2024-03-29 16:34:37,709][00497] Updated weights for policy 0, policy_version 40445 (0.0026) [2024-03-29 16:34:38,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 662700032. Throughput: 0: 42089.3. Samples: 544895820. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:34:38,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:34:41,153][00497] Updated weights for policy 0, policy_version 40455 (0.0024) [2024-03-29 16:34:43,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 662929408. Throughput: 0: 42286.6. Samples: 545140260. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:34:43,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 16:34:45,144][00497] Updated weights for policy 0, policy_version 40465 (0.0021) [2024-03-29 16:34:48,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 663109632. Throughput: 0: 42136.8. Samples: 545268840. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:34:48,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:34:49,668][00497] Updated weights for policy 0, policy_version 40475 (0.0031) [2024-03-29 16:34:53,417][00497] Updated weights for policy 0, policy_version 40485 (0.0024) [2024-03-29 16:34:53,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 663322624. Throughput: 0: 42192.4. Samples: 545528020. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 16:34:53,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 16:34:56,823][00497] Updated weights for policy 0, policy_version 40495 (0.0034) [2024-03-29 16:34:58,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 663535616. Throughput: 0: 42303.9. Samples: 545759860. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 16:34:58,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:35:00,766][00497] Updated weights for policy 0, policy_version 40505 (0.0019) [2024-03-29 16:35:03,624][00476] Signal inference workers to stop experience collection... (19450 times) [2024-03-29 16:35:03,656][00497] InferenceWorker_p0-w0: stopping experience collection (19450 times) [2024-03-29 16:35:03,837][00476] Signal inference workers to resume experience collection... (19450 times) [2024-03-29 16:35:03,838][00497] InferenceWorker_p0-w0: resuming experience collection (19450 times) [2024-03-29 16:35:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 663748608. Throughput: 0: 42018.8. Samples: 545890400. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 16:35:03,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 16:35:04,114][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000040513_663764992.pth... [2024-03-29 16:35:04,663][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000039902_653754368.pth [2024-03-29 16:35:05,327][00497] Updated weights for policy 0, policy_version 40515 (0.0022) [2024-03-29 16:35:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 663945216. Throughput: 0: 42163.6. Samples: 546162380. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 16:35:08,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:35:08,851][00497] Updated weights for policy 0, policy_version 40525 (0.0023) [2024-03-29 16:35:12,445][00497] Updated weights for policy 0, policy_version 40535 (0.0024) [2024-03-29 16:35:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 664174592. Throughput: 0: 41921.0. Samples: 546386040. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 16:35:13,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 16:35:16,296][00497] Updated weights for policy 0, policy_version 40545 (0.0023) [2024-03-29 16:35:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 664371200. Throughput: 0: 41889.9. Samples: 546521820. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 16:35:18,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:35:20,824][00497] Updated weights for policy 0, policy_version 40555 (0.0025) [2024-03-29 16:35:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 664584192. Throughput: 0: 42076.0. Samples: 546789240. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 16:35:23,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 16:35:24,396][00497] Updated weights for policy 0, policy_version 40565 (0.0028) [2024-03-29 16:35:27,797][00497] Updated weights for policy 0, policy_version 40575 (0.0020) [2024-03-29 16:35:28,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 664829952. Throughput: 0: 41981.0. Samples: 547029400. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 16:35:28,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 16:35:31,888][00497] Updated weights for policy 0, policy_version 40585 (0.0018) [2024-03-29 16:35:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 665010176. Throughput: 0: 41919.1. Samples: 547155200. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 16:35:33,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:35:36,351][00497] Updated weights for policy 0, policy_version 40595 (0.0034) [2024-03-29 16:35:38,698][00476] Signal inference workers to stop experience collection... (19500 times) [2024-03-29 16:35:38,775][00497] InferenceWorker_p0-w0: stopping experience collection (19500 times) [2024-03-29 16:35:38,782][00476] Signal inference workers to resume experience collection... (19500 times) [2024-03-29 16:35:38,802][00497] InferenceWorker_p0-w0: resuming experience collection (19500 times) [2024-03-29 16:35:38,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 665206784. Throughput: 0: 42120.0. Samples: 547423420. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 16:35:38,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:35:39,912][00497] Updated weights for policy 0, policy_version 40605 (0.0023) [2024-03-29 16:35:43,377][00497] Updated weights for policy 0, policy_version 40615 (0.0025) [2024-03-29 16:35:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 665452544. Throughput: 0: 42391.9. Samples: 547667500. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 16:35:43,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 16:35:47,678][00497] Updated weights for policy 0, policy_version 40625 (0.0021) [2024-03-29 16:35:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 665632768. Throughput: 0: 42141.7. Samples: 547786780. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 16:35:48,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 16:35:51,851][00497] Updated weights for policy 0, policy_version 40635 (0.0027) [2024-03-29 16:35:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 665845760. Throughput: 0: 42027.9. Samples: 548053640. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:35:53,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 16:35:55,531][00497] Updated weights for policy 0, policy_version 40645 (0.0022) [2024-03-29 16:35:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 666058752. Throughput: 0: 42492.1. Samples: 548298180. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:35:58,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 16:35:59,106][00497] Updated weights for policy 0, policy_version 40655 (0.0026) [2024-03-29 16:36:03,270][00497] Updated weights for policy 0, policy_version 40665 (0.0027) [2024-03-29 16:36:03,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 666271744. Throughput: 0: 42093.3. Samples: 548416020. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:36:03,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 16:36:07,509][00497] Updated weights for policy 0, policy_version 40675 (0.0022) [2024-03-29 16:36:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 666468352. Throughput: 0: 41990.3. Samples: 548678800. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:36:08,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:36:11,317][00497] Updated weights for policy 0, policy_version 40685 (0.0027) [2024-03-29 16:36:11,865][00476] Signal inference workers to stop experience collection... (19550 times) [2024-03-29 16:36:11,938][00476] Signal inference workers to resume experience collection... (19550 times) [2024-03-29 16:36:11,940][00497] InferenceWorker_p0-w0: stopping experience collection (19550 times) [2024-03-29 16:36:11,959][00497] InferenceWorker_p0-w0: resuming experience collection (19550 times) [2024-03-29 16:36:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 666697728. Throughput: 0: 42021.2. Samples: 548920360. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:36:13,841][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 16:36:14,800][00497] Updated weights for policy 0, policy_version 40695 (0.0028) [2024-03-29 16:36:18,831][00497] Updated weights for policy 0, policy_version 40705 (0.0019) [2024-03-29 16:36:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41932.0). Total num frames: 666910720. Throughput: 0: 41786.9. Samples: 549035600. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:36:18,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 16:36:23,144][00497] Updated weights for policy 0, policy_version 40715 (0.0022) [2024-03-29 16:36:23,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 667107328. Throughput: 0: 41841.3. Samples: 549306280. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:36:23,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 16:36:26,762][00497] Updated weights for policy 0, policy_version 40725 (0.0017) [2024-03-29 16:36:28,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 667336704. Throughput: 0: 41923.6. Samples: 549554060. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:36:28,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 16:36:30,165][00497] Updated weights for policy 0, policy_version 40735 (0.0019) [2024-03-29 16:36:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 667533312. Throughput: 0: 41960.9. Samples: 549675020. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:36:33,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 16:36:34,373][00497] Updated weights for policy 0, policy_version 40745 (0.0025) [2024-03-29 16:36:38,608][00497] Updated weights for policy 0, policy_version 40755 (0.0022) [2024-03-29 16:36:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 667729920. Throughput: 0: 41917.4. Samples: 549939920. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 16:36:38,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 16:36:42,322][00497] Updated weights for policy 0, policy_version 40765 (0.0025) [2024-03-29 16:36:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.4, 300 sec: 42098.5). Total num frames: 667975680. Throughput: 0: 42078.2. Samples: 550191700. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:36:43,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 16:36:45,694][00497] Updated weights for policy 0, policy_version 40775 (0.0027) [2024-03-29 16:36:45,717][00476] Signal inference workers to stop experience collection... (19600 times) [2024-03-29 16:36:45,753][00497] InferenceWorker_p0-w0: stopping experience collection (19600 times) [2024-03-29 16:36:45,942][00476] Signal inference workers to resume experience collection... (19600 times) [2024-03-29 16:36:45,943][00497] InferenceWorker_p0-w0: resuming experience collection (19600 times) [2024-03-29 16:36:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 668172288. Throughput: 0: 42035.6. Samples: 550307620. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:36:48,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 16:36:50,026][00497] Updated weights for policy 0, policy_version 40785 (0.0032) [2024-03-29 16:36:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 668368896. Throughput: 0: 42268.8. Samples: 550580900. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:36:53,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 16:36:54,024][00497] Updated weights for policy 0, policy_version 40795 (0.0020) [2024-03-29 16:36:57,791][00497] Updated weights for policy 0, policy_version 40805 (0.0020) [2024-03-29 16:36:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 668598272. Throughput: 0: 42390.3. Samples: 550827920. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:36:58,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:37:01,221][00497] Updated weights for policy 0, policy_version 40815 (0.0029) [2024-03-29 16:37:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 668811264. Throughput: 0: 42280.3. Samples: 550938220. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 16:37:03,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 16:37:04,087][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000040822_668827648.pth... [2024-03-29 16:37:04,389][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000040205_658718720.pth [2024-03-29 16:37:05,307][00497] Updated weights for policy 0, policy_version 40825 (0.0029) [2024-03-29 16:37:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 668991488. Throughput: 0: 42217.8. Samples: 551206080. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 16:37:08,841][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:37:09,637][00497] Updated weights for policy 0, policy_version 40835 (0.0024) [2024-03-29 16:37:13,397][00497] Updated weights for policy 0, policy_version 40845 (0.0019) [2024-03-29 16:37:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 669220864. Throughput: 0: 42384.1. Samples: 551461340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 16:37:13,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 16:37:16,798][00497] Updated weights for policy 0, policy_version 40855 (0.0021) [2024-03-29 16:37:17,130][00476] Signal inference workers to stop experience collection... (19650 times) [2024-03-29 16:37:17,154][00497] InferenceWorker_p0-w0: stopping experience collection (19650 times) [2024-03-29 16:37:17,348][00476] Signal inference workers to resume experience collection... (19650 times) [2024-03-29 16:37:17,348][00497] InferenceWorker_p0-w0: resuming experience collection (19650 times) [2024-03-29 16:37:18,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 669450240. Throughput: 0: 42167.1. Samples: 551572540. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 16:37:18,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 16:37:21,042][00497] Updated weights for policy 0, policy_version 40865 (0.0025) [2024-03-29 16:37:23,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 669630464. Throughput: 0: 42160.8. Samples: 551837160. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 16:37:23,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 16:37:25,288][00497] Updated weights for policy 0, policy_version 40875 (0.0023) [2024-03-29 16:37:28,788][00497] Updated weights for policy 0, policy_version 40885 (0.0027) [2024-03-29 16:37:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 669859840. Throughput: 0: 42348.9. Samples: 552097400. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 16:37:28,841][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:37:32,101][00497] Updated weights for policy 0, policy_version 40895 (0.0029) [2024-03-29 16:37:33,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 670089216. Throughput: 0: 42240.0. Samples: 552208420. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 16:37:33,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 16:37:36,560][00497] Updated weights for policy 0, policy_version 40905 (0.0028) [2024-03-29 16:37:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 670269440. Throughput: 0: 42048.8. Samples: 552473100. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 16:37:38,842][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 16:37:40,695][00497] Updated weights for policy 0, policy_version 40915 (0.0029) [2024-03-29 16:37:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 670482432. Throughput: 0: 42435.5. Samples: 552737520. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 16:37:43,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:37:44,234][00497] Updated weights for policy 0, policy_version 40925 (0.0025) [2024-03-29 16:37:47,504][00497] Updated weights for policy 0, policy_version 40935 (0.0026) [2024-03-29 16:37:48,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 670728192. Throughput: 0: 42452.0. Samples: 552848560. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 16:37:48,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 16:37:52,052][00497] Updated weights for policy 0, policy_version 40945 (0.0023) [2024-03-29 16:37:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 670908416. Throughput: 0: 42302.6. Samples: 553109700. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 16:37:53,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:37:55,548][00476] Signal inference workers to stop experience collection... (19700 times) [2024-03-29 16:37:55,599][00497] InferenceWorker_p0-w0: stopping experience collection (19700 times) [2024-03-29 16:37:55,637][00476] Signal inference workers to resume experience collection... (19700 times) [2024-03-29 16:37:55,640][00497] InferenceWorker_p0-w0: resuming experience collection (19700 times) [2024-03-29 16:37:56,198][00497] Updated weights for policy 0, policy_version 40955 (0.0028) [2024-03-29 16:37:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 671105024. Throughput: 0: 42381.3. Samples: 553368500. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 16:37:58,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 16:37:59,732][00497] Updated weights for policy 0, policy_version 40965 (0.0018) [2024-03-29 16:38:03,046][00497] Updated weights for policy 0, policy_version 40975 (0.0017) [2024-03-29 16:38:03,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 671367168. Throughput: 0: 42591.5. Samples: 553489160. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:38:03,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 16:38:07,300][00497] Updated weights for policy 0, policy_version 40985 (0.0023) [2024-03-29 16:38:08,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 671547392. Throughput: 0: 42473.0. Samples: 553748440. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:38:08,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 16:38:11,599][00497] Updated weights for policy 0, policy_version 40995 (0.0029) [2024-03-29 16:38:13,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 671744000. Throughput: 0: 42350.7. Samples: 554003180. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:38:13,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 16:38:15,204][00497] Updated weights for policy 0, policy_version 41005 (0.0030) [2024-03-29 16:38:18,428][00497] Updated weights for policy 0, policy_version 41015 (0.0019) [2024-03-29 16:38:18,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 672006144. Throughput: 0: 42500.0. Samples: 554120920. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:38:18,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 16:38:22,707][00497] Updated weights for policy 0, policy_version 41025 (0.0019) [2024-03-29 16:38:23,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 672186368. Throughput: 0: 42448.1. Samples: 554383260. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:38:23,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 16:38:26,900][00497] Updated weights for policy 0, policy_version 41035 (0.0023) [2024-03-29 16:38:28,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 672399360. Throughput: 0: 42401.7. Samples: 554645600. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:38:28,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 16:38:29,249][00476] Signal inference workers to stop experience collection... (19750 times) [2024-03-29 16:38:29,320][00497] InferenceWorker_p0-w0: stopping experience collection (19750 times) [2024-03-29 16:38:29,413][00476] Signal inference workers to resume experience collection... (19750 times) [2024-03-29 16:38:29,414][00497] InferenceWorker_p0-w0: resuming experience collection (19750 times) [2024-03-29 16:38:30,554][00497] Updated weights for policy 0, policy_version 41045 (0.0028) [2024-03-29 16:38:33,792][00497] Updated weights for policy 0, policy_version 41055 (0.0020) [2024-03-29 16:38:33,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 672645120. Throughput: 0: 42550.6. Samples: 554763340. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:38:33,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 16:38:38,192][00497] Updated weights for policy 0, policy_version 41065 (0.0030) [2024-03-29 16:38:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 672825344. Throughput: 0: 42460.4. Samples: 555020420. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:38:38,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 16:38:42,447][00497] Updated weights for policy 0, policy_version 41075 (0.0019) [2024-03-29 16:38:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 673038336. Throughput: 0: 42670.7. Samples: 555288680. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:38:43,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:38:45,896][00497] Updated weights for policy 0, policy_version 41085 (0.0025) [2024-03-29 16:38:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 673267712. Throughput: 0: 42525.7. Samples: 555402820. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 16:38:48,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 16:38:49,329][00497] Updated weights for policy 0, policy_version 41095 (0.0022) [2024-03-29 16:38:53,817][00497] Updated weights for policy 0, policy_version 41105 (0.0030) [2024-03-29 16:38:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 673464320. Throughput: 0: 42151.9. Samples: 555645280. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 16:38:53,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:38:57,878][00497] Updated weights for policy 0, policy_version 41115 (0.0028) [2024-03-29 16:38:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 673660928. Throughput: 0: 42434.2. Samples: 555912720. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 16:38:58,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:39:01,574][00497] Updated weights for policy 0, policy_version 41125 (0.0033) [2024-03-29 16:39:01,913][00476] Signal inference workers to stop experience collection... (19800 times) [2024-03-29 16:39:01,955][00497] InferenceWorker_p0-w0: stopping experience collection (19800 times) [2024-03-29 16:39:02,085][00476] Signal inference workers to resume experience collection... (19800 times) [2024-03-29 16:39:02,085][00497] InferenceWorker_p0-w0: resuming experience collection (19800 times) [2024-03-29 16:39:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 673906688. Throughput: 0: 42871.6. Samples: 556050140. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 16:39:03,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 16:39:04,061][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000041133_673923072.pth... [2024-03-29 16:39:04,396][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000040513_663764992.pth [2024-03-29 16:39:04,973][00497] Updated weights for policy 0, policy_version 41135 (0.0023) [2024-03-29 16:39:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 674086912. Throughput: 0: 42110.7. Samples: 556278240. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 16:39:08,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 16:39:09,202][00497] Updated weights for policy 0, policy_version 41145 (0.0020) [2024-03-29 16:39:13,317][00497] Updated weights for policy 0, policy_version 41155 (0.0018) [2024-03-29 16:39:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42871.6, 300 sec: 42320.7). Total num frames: 674316288. Throughput: 0: 42402.4. Samples: 556553700. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 16:39:13,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 16:39:17,051][00497] Updated weights for policy 0, policy_version 41165 (0.0026) [2024-03-29 16:39:18,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 674529280. Throughput: 0: 42638.2. Samples: 556682060. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 16:39:18,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 16:39:20,298][00497] Updated weights for policy 0, policy_version 41175 (0.0019) [2024-03-29 16:39:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 674725888. Throughput: 0: 41927.3. Samples: 556907140. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 16:39:23,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 16:39:24,951][00497] Updated weights for policy 0, policy_version 41185 (0.0025) [2024-03-29 16:39:28,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 674922496. Throughput: 0: 42118.6. Samples: 557184020. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 16:39:28,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 16:39:28,863][00497] Updated weights for policy 0, policy_version 41195 (0.0021) [2024-03-29 16:39:32,756][00497] Updated weights for policy 0, policy_version 41205 (0.0036) [2024-03-29 16:39:33,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 675151872. Throughput: 0: 42405.8. Samples: 557311080. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 16:39:33,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 16:39:35,981][00497] Updated weights for policy 0, policy_version 41215 (0.0021) [2024-03-29 16:39:37,084][00476] Signal inference workers to stop experience collection... (19850 times) [2024-03-29 16:39:37,107][00497] InferenceWorker_p0-w0: stopping experience collection (19850 times) [2024-03-29 16:39:37,301][00476] Signal inference workers to resume experience collection... (19850 times) [2024-03-29 16:39:37,301][00497] InferenceWorker_p0-w0: resuming experience collection (19850 times) [2024-03-29 16:39:38,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 675381248. Throughput: 0: 42253.0. Samples: 557546660. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 16:39:38,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 16:39:40,591][00497] Updated weights for policy 0, policy_version 41225 (0.0025) [2024-03-29 16:39:43,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 675545088. Throughput: 0: 42018.8. Samples: 557803560. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 16:39:43,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 16:39:44,775][00497] Updated weights for policy 0, policy_version 41235 (0.0029) [2024-03-29 16:39:48,516][00497] Updated weights for policy 0, policy_version 41245 (0.0019) [2024-03-29 16:39:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 675774464. Throughput: 0: 41765.8. Samples: 557929600. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:39:48,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:39:51,924][00497] Updated weights for policy 0, policy_version 41255 (0.0021) [2024-03-29 16:39:53,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 676003840. Throughput: 0: 41891.0. Samples: 558163340. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:39:53,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 16:39:56,178][00497] Updated weights for policy 0, policy_version 41265 (0.0029) [2024-03-29 16:39:58,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 676167680. Throughput: 0: 41679.4. Samples: 558429280. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:39:58,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:40:00,250][00497] Updated weights for policy 0, policy_version 41275 (0.0024) [2024-03-29 16:40:03,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 676397056. Throughput: 0: 41672.2. Samples: 558557300. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:40:03,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:40:03,971][00497] Updated weights for policy 0, policy_version 41285 (0.0018) [2024-03-29 16:40:07,296][00497] Updated weights for policy 0, policy_version 41295 (0.0025) [2024-03-29 16:40:08,839][00126] Fps is (10 sec: 47514.1, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 676642816. Throughput: 0: 42204.4. Samples: 558806340. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 16:40:08,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 16:40:11,305][00476] Signal inference workers to stop experience collection... (19900 times) [2024-03-29 16:40:11,306][00476] Signal inference workers to resume experience collection... (19900 times) [2024-03-29 16:40:11,333][00497] InferenceWorker_p0-w0: stopping experience collection (19900 times) [2024-03-29 16:40:11,355][00497] InferenceWorker_p0-w0: resuming experience collection (19900 times) [2024-03-29 16:40:11,611][00497] Updated weights for policy 0, policy_version 41305 (0.0037) [2024-03-29 16:40:13,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 676806656. Throughput: 0: 41582.7. Samples: 559055240. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:40:13,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 16:40:15,704][00497] Updated weights for policy 0, policy_version 41315 (0.0030) [2024-03-29 16:40:18,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.3, 300 sec: 42154.1). Total num frames: 677019648. Throughput: 0: 41593.9. Samples: 559182800. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:40:18,841][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 16:40:19,787][00497] Updated weights for policy 0, policy_version 41325 (0.0027) [2024-03-29 16:40:23,011][00497] Updated weights for policy 0, policy_version 41335 (0.0020) [2024-03-29 16:40:23,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 677265408. Throughput: 0: 41884.8. Samples: 559431480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:40:23,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:40:27,212][00497] Updated weights for policy 0, policy_version 41345 (0.0025) [2024-03-29 16:40:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 677445632. Throughput: 0: 41759.5. Samples: 559682740. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:40:28,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 16:40:31,165][00497] Updated weights for policy 0, policy_version 41355 (0.0021) [2024-03-29 16:40:33,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 677658624. Throughput: 0: 42043.6. Samples: 559821560. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:40:33,841][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 16:40:35,055][00497] Updated weights for policy 0, policy_version 41365 (0.0024) [2024-03-29 16:40:38,519][00497] Updated weights for policy 0, policy_version 41375 (0.0018) [2024-03-29 16:40:38,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 677904384. Throughput: 0: 42435.1. Samples: 560072920. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 16:40:38,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 16:40:42,773][00497] Updated weights for policy 0, policy_version 41385 (0.0022) [2024-03-29 16:40:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 678100992. Throughput: 0: 42070.3. Samples: 560322440. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 16:40:43,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 16:40:46,653][00497] Updated weights for policy 0, policy_version 41395 (0.0033) [2024-03-29 16:40:48,059][00476] Signal inference workers to stop experience collection... (19950 times) [2024-03-29 16:40:48,059][00476] Signal inference workers to resume experience collection... (19950 times) [2024-03-29 16:40:48,100][00497] InferenceWorker_p0-w0: stopping experience collection (19950 times) [2024-03-29 16:40:48,100][00497] InferenceWorker_p0-w0: resuming experience collection (19950 times) [2024-03-29 16:40:48,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 678281216. Throughput: 0: 42167.9. Samples: 560454860. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 16:40:48,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:40:50,489][00497] Updated weights for policy 0, policy_version 41405 (0.0028) [2024-03-29 16:40:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 678526976. Throughput: 0: 42150.2. Samples: 560703100. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 16:40:53,842][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 16:40:53,953][00497] Updated weights for policy 0, policy_version 41415 (0.0029) [2024-03-29 16:40:58,051][00497] Updated weights for policy 0, policy_version 41425 (0.0019) [2024-03-29 16:40:58,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42871.6, 300 sec: 42265.2). Total num frames: 678739968. Throughput: 0: 42561.0. Samples: 560970480. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 16:40:58,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 16:41:02,142][00497] Updated weights for policy 0, policy_version 41435 (0.0026) [2024-03-29 16:41:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 678936576. Throughput: 0: 42432.9. Samples: 561092280. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 16:41:03,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 16:41:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000041439_678936576.pth... [2024-03-29 16:41:04,176][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000040822_668827648.pth [2024-03-29 16:41:05,970][00497] Updated weights for policy 0, policy_version 41445 (0.0035) [2024-03-29 16:41:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42209.7). Total num frames: 679149568. Throughput: 0: 42307.3. Samples: 561335300. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 16:41:08,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 16:41:09,654][00497] Updated weights for policy 0, policy_version 41455 (0.0031) [2024-03-29 16:41:13,758][00497] Updated weights for policy 0, policy_version 41465 (0.0027) [2024-03-29 16:41:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 679362560. Throughput: 0: 42430.7. Samples: 561592120. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 16:41:13,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 16:41:17,860][00497] Updated weights for policy 0, policy_version 41475 (0.0022) [2024-03-29 16:41:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 679559168. Throughput: 0: 42113.8. Samples: 561716680. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 16:41:18,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 16:41:21,625][00497] Updated weights for policy 0, policy_version 41485 (0.0018) [2024-03-29 16:41:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.4, 300 sec: 42209.7). Total num frames: 679788544. Throughput: 0: 42313.5. Samples: 561977020. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 16:41:23,841][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 16:41:24,554][00476] Signal inference workers to stop experience collection... (20000 times) [2024-03-29 16:41:24,599][00497] InferenceWorker_p0-w0: stopping experience collection (20000 times) [2024-03-29 16:41:24,717][00476] Signal inference workers to resume experience collection... (20000 times) [2024-03-29 16:41:24,718][00497] InferenceWorker_p0-w0: resuming experience collection (20000 times) [2024-03-29 16:41:24,985][00497] Updated weights for policy 0, policy_version 41495 (0.0031) [2024-03-29 16:41:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 679985152. Throughput: 0: 42177.4. Samples: 562220420. Policy #0 lag: (min: 1.0, avg: 19.6, max: 43.0) [2024-03-29 16:41:28,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 16:41:29,258][00497] Updated weights for policy 0, policy_version 41505 (0.0026) [2024-03-29 16:41:33,268][00497] Updated weights for policy 0, policy_version 41515 (0.0019) [2024-03-29 16:41:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 680198144. Throughput: 0: 42196.5. Samples: 562353700. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 16:41:33,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 16:41:37,278][00497] Updated weights for policy 0, policy_version 41525 (0.0024) [2024-03-29 16:41:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 680427520. Throughput: 0: 42255.7. Samples: 562604600. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 16:41:38,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 16:41:40,762][00497] Updated weights for policy 0, policy_version 41535 (0.0024) [2024-03-29 16:41:43,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 680624128. Throughput: 0: 41717.8. Samples: 562847780. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 16:41:43,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 16:41:44,999][00497] Updated weights for policy 0, policy_version 41545 (0.0024) [2024-03-29 16:41:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.5, 300 sec: 42209.6). Total num frames: 680820736. Throughput: 0: 41759.7. Samples: 562971460. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 16:41:48,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 16:41:48,984][00497] Updated weights for policy 0, policy_version 41555 (0.0022) [2024-03-29 16:41:52,661][00497] Updated weights for policy 0, policy_version 41565 (0.0022) [2024-03-29 16:41:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 681050112. Throughput: 0: 42443.1. Samples: 563245240. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 16:41:53,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:41:55,986][00497] Updated weights for policy 0, policy_version 41575 (0.0027) [2024-03-29 16:41:58,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 681279488. Throughput: 0: 42145.7. Samples: 563488680. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 16:41:58,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 16:42:00,205][00497] Updated weights for policy 0, policy_version 41585 (0.0021) [2024-03-29 16:42:00,509][00476] Signal inference workers to stop experience collection... (20050 times) [2024-03-29 16:42:00,531][00497] InferenceWorker_p0-w0: stopping experience collection (20050 times) [2024-03-29 16:42:00,720][00476] Signal inference workers to resume experience collection... (20050 times) [2024-03-29 16:42:00,720][00497] InferenceWorker_p0-w0: resuming experience collection (20050 times) [2024-03-29 16:42:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 681459712. Throughput: 0: 42084.9. Samples: 563610500. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 16:42:03,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:42:04,244][00497] Updated weights for policy 0, policy_version 41595 (0.0025) [2024-03-29 16:42:08,007][00497] Updated weights for policy 0, policy_version 41605 (0.0019) [2024-03-29 16:42:08,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 681689088. Throughput: 0: 42428.9. Samples: 563886320. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 16:42:08,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 16:42:11,398][00497] Updated weights for policy 0, policy_version 41615 (0.0023) [2024-03-29 16:42:13,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 681918464. Throughput: 0: 42315.6. Samples: 564124620. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 16:42:13,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:42:15,624][00497] Updated weights for policy 0, policy_version 41625 (0.0022) [2024-03-29 16:42:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 682115072. Throughput: 0: 42218.3. Samples: 564253520. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 16:42:18,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 16:42:19,788][00497] Updated weights for policy 0, policy_version 41635 (0.0019) [2024-03-29 16:42:23,451][00497] Updated weights for policy 0, policy_version 41645 (0.0019) [2024-03-29 16:42:23,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.2, 300 sec: 42265.2). Total num frames: 682328064. Throughput: 0: 42678.1. Samples: 564525120. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 16:42:23,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 16:42:26,669][00497] Updated weights for policy 0, policy_version 41655 (0.0022) [2024-03-29 16:42:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 682557440. Throughput: 0: 42624.8. Samples: 564765900. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:42:28,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 16:42:30,866][00497] Updated weights for policy 0, policy_version 41665 (0.0018) [2024-03-29 16:42:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 682754048. Throughput: 0: 42987.5. Samples: 564905900. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:42:33,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 16:42:34,855][00497] Updated weights for policy 0, policy_version 41675 (0.0022) [2024-03-29 16:42:38,671][00497] Updated weights for policy 0, policy_version 41685 (0.0028) [2024-03-29 16:42:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 682967040. Throughput: 0: 42818.1. Samples: 565172060. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:42:38,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 16:42:41,176][00476] Signal inference workers to stop experience collection... (20100 times) [2024-03-29 16:42:41,221][00497] InferenceWorker_p0-w0: stopping experience collection (20100 times) [2024-03-29 16:42:41,268][00476] Signal inference workers to resume experience collection... (20100 times) [2024-03-29 16:42:41,291][00497] InferenceWorker_p0-w0: resuming experience collection (20100 times) [2024-03-29 16:42:41,947][00497] Updated weights for policy 0, policy_version 41695 (0.0033) [2024-03-29 16:42:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 683196416. Throughput: 0: 42752.9. Samples: 565412560. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:42:43,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 16:42:46,218][00497] Updated weights for policy 0, policy_version 41705 (0.0019) [2024-03-29 16:42:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42871.4, 300 sec: 42320.7). Total num frames: 683393024. Throughput: 0: 43124.0. Samples: 565551080. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:42:48,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 16:42:50,112][00497] Updated weights for policy 0, policy_version 41715 (0.0025) [2024-03-29 16:42:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 42376.3). Total num frames: 683606016. Throughput: 0: 42952.4. Samples: 565819180. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 16:42:53,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 16:42:53,861][00497] Updated weights for policy 0, policy_version 41725 (0.0024) [2024-03-29 16:42:57,201][00497] Updated weights for policy 0, policy_version 41735 (0.0020) [2024-03-29 16:42:58,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42871.5, 300 sec: 42320.7). Total num frames: 683851776. Throughput: 0: 42947.1. Samples: 566057240. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 16:42:58,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 16:43:01,480][00497] Updated weights for policy 0, policy_version 41745 (0.0021) [2024-03-29 16:43:03,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42871.3, 300 sec: 42320.7). Total num frames: 684032000. Throughput: 0: 43021.2. Samples: 566189480. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 16:43:03,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 16:43:03,862][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000041750_684032000.pth... [2024-03-29 16:43:04,183][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000041133_673923072.pth [2024-03-29 16:43:05,654][00497] Updated weights for policy 0, policy_version 41755 (0.0019) [2024-03-29 16:43:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.3, 300 sec: 42376.3). Total num frames: 684244992. Throughput: 0: 42878.8. Samples: 566454660. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 16:43:08,841][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:43:09,374][00497] Updated weights for policy 0, policy_version 41765 (0.0024) [2024-03-29 16:43:12,545][00497] Updated weights for policy 0, policy_version 41775 (0.0019) [2024-03-29 16:43:13,653][00476] Signal inference workers to stop experience collection... (20150 times) [2024-03-29 16:43:13,703][00497] InferenceWorker_p0-w0: stopping experience collection (20150 times) [2024-03-29 16:43:13,834][00476] Signal inference workers to resume experience collection... (20150 times) [2024-03-29 16:43:13,834][00497] InferenceWorker_p0-w0: resuming experience collection (20150 times) [2024-03-29 16:43:13,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42871.4, 300 sec: 42320.7). Total num frames: 684490752. Throughput: 0: 42767.5. Samples: 566690440. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 16:43:13,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 16:43:17,068][00497] Updated weights for policy 0, policy_version 41785 (0.0021) [2024-03-29 16:43:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 684670976. Throughput: 0: 42478.7. Samples: 566817440. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 16:43:18,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 16:43:21,132][00497] Updated weights for policy 0, policy_version 41795 (0.0022) [2024-03-29 16:43:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 684883968. Throughput: 0: 42453.3. Samples: 567082460. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 16:43:23,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 16:43:24,923][00497] Updated weights for policy 0, policy_version 41805 (0.0029) [2024-03-29 16:43:28,069][00497] Updated weights for policy 0, policy_version 41815 (0.0030) [2024-03-29 16:43:28,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.5, 300 sec: 42320.7). Total num frames: 685129728. Throughput: 0: 42478.3. Samples: 567324080. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 16:43:28,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 16:43:32,415][00497] Updated weights for policy 0, policy_version 41825 (0.0019) [2024-03-29 16:43:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.4, 300 sec: 42376.3). Total num frames: 685326336. Throughput: 0: 42300.8. Samples: 567454620. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 16:43:33,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 16:43:36,562][00497] Updated weights for policy 0, policy_version 41835 (0.0018) [2024-03-29 16:43:38,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 685506560. Throughput: 0: 42140.0. Samples: 567715480. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 16:43:38,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:43:40,285][00497] Updated weights for policy 0, policy_version 41845 (0.0033) [2024-03-29 16:43:43,508][00497] Updated weights for policy 0, policy_version 41855 (0.0028) [2024-03-29 16:43:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 685752320. Throughput: 0: 42272.9. Samples: 567959520. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 16:43:43,841][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 16:43:47,834][00497] Updated weights for policy 0, policy_version 41865 (0.0024) [2024-03-29 16:43:48,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42871.4, 300 sec: 42376.3). Total num frames: 685965312. Throughput: 0: 42126.3. Samples: 568085160. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:43:48,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 16:43:50,885][00476] Signal inference workers to stop experience collection... (20200 times) [2024-03-29 16:43:50,886][00476] Signal inference workers to resume experience collection... (20200 times) [2024-03-29 16:43:50,919][00497] InferenceWorker_p0-w0: stopping experience collection (20200 times) [2024-03-29 16:43:50,920][00497] InferenceWorker_p0-w0: resuming experience collection (20200 times) [2024-03-29 16:43:52,115][00497] Updated weights for policy 0, policy_version 41875 (0.0024) [2024-03-29 16:43:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 686145536. Throughput: 0: 42307.0. Samples: 568358480. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:43:53,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 16:43:55,706][00497] Updated weights for policy 0, policy_version 41885 (0.0033) [2024-03-29 16:43:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 686391296. Throughput: 0: 42537.3. Samples: 568604620. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:43:58,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:43:58,870][00497] Updated weights for policy 0, policy_version 41895 (0.0022) [2024-03-29 16:44:03,523][00497] Updated weights for policy 0, policy_version 41905 (0.0021) [2024-03-29 16:44:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42376.2). Total num frames: 686587904. Throughput: 0: 42040.8. Samples: 568709280. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:44:03,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 16:44:07,753][00497] Updated weights for policy 0, policy_version 41915 (0.0026) [2024-03-29 16:44:08,839][00126] Fps is (10 sec: 37683.2, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 686768128. Throughput: 0: 42192.0. Samples: 568981100. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:44:08,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 16:44:11,387][00497] Updated weights for policy 0, policy_version 41925 (0.0025) [2024-03-29 16:44:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 687013888. Throughput: 0: 42369.7. Samples: 569230720. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 16:44:13,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 16:44:14,672][00497] Updated weights for policy 0, policy_version 41935 (0.0030) [2024-03-29 16:44:18,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 687210496. Throughput: 0: 41945.9. Samples: 569342180. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 16:44:18,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:44:19,134][00497] Updated weights for policy 0, policy_version 41945 (0.0020) [2024-03-29 16:44:23,135][00497] Updated weights for policy 0, policy_version 41955 (0.0022) [2024-03-29 16:44:23,450][00476] Signal inference workers to stop experience collection... (20250 times) [2024-03-29 16:44:23,451][00476] Signal inference workers to resume experience collection... (20250 times) [2024-03-29 16:44:23,495][00497] InferenceWorker_p0-w0: stopping experience collection (20250 times) [2024-03-29 16:44:23,495][00497] InferenceWorker_p0-w0: resuming experience collection (20250 times) [2024-03-29 16:44:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 687423488. Throughput: 0: 42243.0. Samples: 569616420. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 16:44:23,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:44:26,841][00497] Updated weights for policy 0, policy_version 41965 (0.0019) [2024-03-29 16:44:28,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42376.3). Total num frames: 687652864. Throughput: 0: 42609.8. Samples: 569876960. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 16:44:28,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 16:44:30,026][00497] Updated weights for policy 0, policy_version 41975 (0.0022) [2024-03-29 16:44:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 687849472. Throughput: 0: 42238.2. Samples: 569985880. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 16:44:33,841][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 16:44:34,463][00497] Updated weights for policy 0, policy_version 41985 (0.0025) [2024-03-29 16:44:38,466][00497] Updated weights for policy 0, policy_version 41995 (0.0020) [2024-03-29 16:44:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 688062464. Throughput: 0: 42332.1. Samples: 570263420. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 16:44:38,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 16:44:42,010][00497] Updated weights for policy 0, policy_version 42005 (0.0028) [2024-03-29 16:44:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42376.2). Total num frames: 688275456. Throughput: 0: 42427.6. Samples: 570513860. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 16:44:43,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 16:44:45,350][00497] Updated weights for policy 0, policy_version 42015 (0.0025) [2024-03-29 16:44:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 688504832. Throughput: 0: 42608.4. Samples: 570626660. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 16:44:48,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 16:44:49,672][00497] Updated weights for policy 0, policy_version 42025 (0.0029) [2024-03-29 16:44:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 688685056. Throughput: 0: 42668.5. Samples: 570901180. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 16:44:53,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 16:44:54,102][00497] Updated weights for policy 0, policy_version 42035 (0.0022) [2024-03-29 16:44:54,154][00476] Signal inference workers to stop experience collection... (20300 times) [2024-03-29 16:44:54,195][00497] InferenceWorker_p0-w0: stopping experience collection (20300 times) [2024-03-29 16:44:54,374][00476] Signal inference workers to resume experience collection... (20300 times) [2024-03-29 16:44:54,374][00497] InferenceWorker_p0-w0: resuming experience collection (20300 times) [2024-03-29 16:44:57,576][00497] Updated weights for policy 0, policy_version 42045 (0.0025) [2024-03-29 16:44:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 688914432. Throughput: 0: 42472.0. Samples: 571141960. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 16:44:58,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 16:45:01,038][00497] Updated weights for policy 0, policy_version 42055 (0.0028) [2024-03-29 16:45:03,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 689127424. Throughput: 0: 42493.6. Samples: 571254400. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 16:45:03,841][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:45:03,862][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042061_689127424.pth... [2024-03-29 16:45:04,154][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000041439_678936576.pth [2024-03-29 16:45:05,512][00497] Updated weights for policy 0, policy_version 42065 (0.0020) [2024-03-29 16:45:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 689307648. Throughput: 0: 42366.3. Samples: 571522900. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 16:45:08,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 16:45:09,766][00497] Updated weights for policy 0, policy_version 42075 (0.0023) [2024-03-29 16:45:13,268][00497] Updated weights for policy 0, policy_version 42085 (0.0023) [2024-03-29 16:45:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 689537024. Throughput: 0: 42346.3. Samples: 571782540. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 16:45:13,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 16:45:16,352][00497] Updated weights for policy 0, policy_version 42095 (0.0024) [2024-03-29 16:45:18,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42376.3). Total num frames: 689766400. Throughput: 0: 42504.5. Samples: 571898580. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 16:45:18,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 16:45:20,707][00497] Updated weights for policy 0, policy_version 42105 (0.0019) [2024-03-29 16:45:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 689963008. Throughput: 0: 42247.0. Samples: 572164540. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 16:45:23,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:45:24,899][00497] Updated weights for policy 0, policy_version 42115 (0.0017) [2024-03-29 16:45:28,584][00497] Updated weights for policy 0, policy_version 42125 (0.0019) [2024-03-29 16:45:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 42431.8). Total num frames: 690176000. Throughput: 0: 42449.3. Samples: 572424080. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 16:45:28,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 16:45:29,498][00476] Signal inference workers to stop experience collection... (20350 times) [2024-03-29 16:45:29,499][00476] Signal inference workers to resume experience collection... (20350 times) [2024-03-29 16:45:29,539][00497] InferenceWorker_p0-w0: stopping experience collection (20350 times) [2024-03-29 16:45:29,539][00497] InferenceWorker_p0-w0: resuming experience collection (20350 times) [2024-03-29 16:45:31,688][00497] Updated weights for policy 0, policy_version 42135 (0.0026) [2024-03-29 16:45:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42376.3). Total num frames: 690405376. Throughput: 0: 42617.3. Samples: 572544440. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 16:45:33,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 16:45:36,152][00497] Updated weights for policy 0, policy_version 42145 (0.0021) [2024-03-29 16:45:38,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 690585600. Throughput: 0: 42279.1. Samples: 572803740. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 16:45:38,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:45:40,271][00497] Updated weights for policy 0, policy_version 42155 (0.0022) [2024-03-29 16:45:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 690814976. Throughput: 0: 42567.9. Samples: 573057520. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 16:45:43,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 16:45:43,941][00497] Updated weights for policy 0, policy_version 42165 (0.0025) [2024-03-29 16:45:47,482][00497] Updated weights for policy 0, policy_version 42175 (0.0027) [2024-03-29 16:45:48,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 691044352. Throughput: 0: 42632.5. Samples: 573172860. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 16:45:48,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:45:51,747][00497] Updated weights for policy 0, policy_version 42185 (0.0021) [2024-03-29 16:45:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 691224576. Throughput: 0: 42318.2. Samples: 573427220. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 16:45:53,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 16:45:55,983][00497] Updated weights for policy 0, policy_version 42195 (0.0018) [2024-03-29 16:45:58,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 691437568. Throughput: 0: 42406.2. Samples: 573690820. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 16:45:58,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 16:45:59,648][00497] Updated weights for policy 0, policy_version 42205 (0.0025) [2024-03-29 16:46:02,882][00497] Updated weights for policy 0, policy_version 42215 (0.0031) [2024-03-29 16:46:03,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 42487.3). Total num frames: 691683328. Throughput: 0: 42577.8. Samples: 573814580. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:46:03,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:46:07,279][00497] Updated weights for policy 0, policy_version 42225 (0.0018) [2024-03-29 16:46:07,299][00476] Signal inference workers to stop experience collection... (20400 times) [2024-03-29 16:46:07,299][00476] Signal inference workers to resume experience collection... (20400 times) [2024-03-29 16:46:07,322][00497] InferenceWorker_p0-w0: stopping experience collection (20400 times) [2024-03-29 16:46:07,344][00497] InferenceWorker_p0-w0: resuming experience collection (20400 times) [2024-03-29 16:46:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 691863552. Throughput: 0: 42131.1. Samples: 574060440. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:46:08,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 16:46:11,704][00497] Updated weights for policy 0, policy_version 42235 (0.0018) [2024-03-29 16:46:13,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 692076544. Throughput: 0: 42334.3. Samples: 574329120. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:46:13,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 16:46:15,121][00497] Updated weights for policy 0, policy_version 42245 (0.0026) [2024-03-29 16:46:18,271][00497] Updated weights for policy 0, policy_version 42255 (0.0023) [2024-03-29 16:46:18,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42598.4, 300 sec: 42487.3). Total num frames: 692322304. Throughput: 0: 42281.0. Samples: 574447080. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:46:18,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 16:46:22,721][00497] Updated weights for policy 0, policy_version 42265 (0.0019) [2024-03-29 16:46:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 692502528. Throughput: 0: 42093.8. Samples: 574697960. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 16:46:23,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:46:27,203][00497] Updated weights for policy 0, policy_version 42275 (0.0018) [2024-03-29 16:46:28,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 42376.2). Total num frames: 692699136. Throughput: 0: 42424.0. Samples: 574966600. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:46:28,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:46:30,621][00497] Updated weights for policy 0, policy_version 42285 (0.0020) [2024-03-29 16:46:33,823][00497] Updated weights for policy 0, policy_version 42295 (0.0023) [2024-03-29 16:46:33,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42598.4, 300 sec: 42487.3). Total num frames: 692961280. Throughput: 0: 42266.6. Samples: 575074860. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:46:33,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:46:38,090][00497] Updated weights for policy 0, policy_version 42305 (0.0023) [2024-03-29 16:46:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 693141504. Throughput: 0: 42351.5. Samples: 575333040. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:46:38,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 16:46:42,958][00497] Updated weights for policy 0, policy_version 42315 (0.0018) [2024-03-29 16:46:43,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 693338112. Throughput: 0: 42537.7. Samples: 575605020. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:46:43,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 16:46:45,151][00476] Signal inference workers to stop experience collection... (20450 times) [2024-03-29 16:46:45,191][00497] InferenceWorker_p0-w0: stopping experience collection (20450 times) [2024-03-29 16:46:45,367][00476] Signal inference workers to resume experience collection... (20450 times) [2024-03-29 16:46:45,367][00497] InferenceWorker_p0-w0: resuming experience collection (20450 times) [2024-03-29 16:46:45,925][00497] Updated weights for policy 0, policy_version 42325 (0.0021) [2024-03-29 16:46:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 693583872. Throughput: 0: 42168.8. Samples: 575712180. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:46:48,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 16:46:49,390][00497] Updated weights for policy 0, policy_version 42335 (0.0027) [2024-03-29 16:46:53,551][00497] Updated weights for policy 0, policy_version 42345 (0.0027) [2024-03-29 16:46:53,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 693780480. Throughput: 0: 42254.2. Samples: 575961880. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 16:46:53,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 16:46:58,508][00497] Updated weights for policy 0, policy_version 42355 (0.0029) [2024-03-29 16:46:58,839][00126] Fps is (10 sec: 37683.2, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 693960704. Throughput: 0: 42379.5. Samples: 576236200. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 16:46:58,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:47:01,652][00497] Updated weights for policy 0, policy_version 42365 (0.0024) [2024-03-29 16:47:03,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 694222848. Throughput: 0: 42341.7. Samples: 576352460. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 16:47:03,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 16:47:04,067][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042373_694239232.pth... [2024-03-29 16:47:04,389][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000041750_684032000.pth [2024-03-29 16:47:04,934][00497] Updated weights for policy 0, policy_version 42375 (0.0024) [2024-03-29 16:47:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 694403072. Throughput: 0: 42046.2. Samples: 576590040. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 16:47:08,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 16:47:09,381][00497] Updated weights for policy 0, policy_version 42385 (0.0022) [2024-03-29 16:47:13,839][00126] Fps is (10 sec: 36044.7, 60 sec: 41779.1, 300 sec: 42265.1). Total num frames: 694583296. Throughput: 0: 42073.7. Samples: 576859920. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 16:47:13,841][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 16:47:14,000][00497] Updated weights for policy 0, policy_version 42395 (0.0027) [2024-03-29 16:47:17,256][00497] Updated weights for policy 0, policy_version 42405 (0.0023) [2024-03-29 16:47:17,286][00476] Signal inference workers to stop experience collection... (20500 times) [2024-03-29 16:47:17,320][00497] InferenceWorker_p0-w0: stopping experience collection (20500 times) [2024-03-29 16:47:17,509][00476] Signal inference workers to resume experience collection... (20500 times) [2024-03-29 16:47:17,510][00497] InferenceWorker_p0-w0: resuming experience collection (20500 times) [2024-03-29 16:47:18,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42431.8). Total num frames: 694845440. Throughput: 0: 42451.6. Samples: 576985180. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 16:47:18,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 16:47:20,235][00497] Updated weights for policy 0, policy_version 42415 (0.0027) [2024-03-29 16:47:23,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 695042048. Throughput: 0: 42169.4. Samples: 577230660. Policy #0 lag: (min: 0.0, avg: 23.9, max: 41.0) [2024-03-29 16:47:23,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 16:47:24,893][00497] Updated weights for policy 0, policy_version 42425 (0.0026) [2024-03-29 16:47:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 695222272. Throughput: 0: 42129.9. Samples: 577500860. Policy #0 lag: (min: 0.0, avg: 23.9, max: 41.0) [2024-03-29 16:47:28,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:47:29,261][00497] Updated weights for policy 0, policy_version 42435 (0.0026) [2024-03-29 16:47:32,453][00497] Updated weights for policy 0, policy_version 42445 (0.0028) [2024-03-29 16:47:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.4, 300 sec: 42431.8). Total num frames: 695484416. Throughput: 0: 42616.5. Samples: 577629920. Policy #0 lag: (min: 0.0, avg: 23.9, max: 41.0) [2024-03-29 16:47:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:47:35,418][00497] Updated weights for policy 0, policy_version 42455 (0.0019) [2024-03-29 16:47:38,839][00126] Fps is (10 sec: 47513.6, 60 sec: 42598.5, 300 sec: 42376.2). Total num frames: 695697408. Throughput: 0: 42340.1. Samples: 577867180. Policy #0 lag: (min: 0.0, avg: 23.9, max: 41.0) [2024-03-29 16:47:38,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:47:40,048][00497] Updated weights for policy 0, policy_version 42465 (0.0021) [2024-03-29 16:47:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 695877632. Throughput: 0: 42321.0. Samples: 578140640. Policy #0 lag: (min: 0.0, avg: 23.9, max: 41.0) [2024-03-29 16:47:43,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 16:47:44,725][00497] Updated weights for policy 0, policy_version 42475 (0.0018) [2024-03-29 16:47:47,721][00497] Updated weights for policy 0, policy_version 42485 (0.0027) [2024-03-29 16:47:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 696123392. Throughput: 0: 42683.1. Samples: 578273200. Policy #0 lag: (min: 0.0, avg: 23.9, max: 41.0) [2024-03-29 16:47:48,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 16:47:50,303][00476] Signal inference workers to stop experience collection... (20550 times) [2024-03-29 16:47:50,378][00476] Signal inference workers to resume experience collection... (20550 times) [2024-03-29 16:47:50,380][00497] InferenceWorker_p0-w0: stopping experience collection (20550 times) [2024-03-29 16:47:50,415][00497] InferenceWorker_p0-w0: resuming experience collection (20550 times) [2024-03-29 16:47:50,943][00497] Updated weights for policy 0, policy_version 42495 (0.0035) [2024-03-29 16:47:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 696320000. Throughput: 0: 42287.1. Samples: 578492960. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 16:47:53,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:47:55,531][00497] Updated weights for policy 0, policy_version 42505 (0.0027) [2024-03-29 16:47:58,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 696516608. Throughput: 0: 42483.3. Samples: 578771660. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 16:47:58,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 16:48:00,218][00497] Updated weights for policy 0, policy_version 42515 (0.0025) [2024-03-29 16:48:03,506][00497] Updated weights for policy 0, policy_version 42525 (0.0025) [2024-03-29 16:48:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 696729600. Throughput: 0: 42556.9. Samples: 578900240. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 16:48:03,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 16:48:06,537][00497] Updated weights for policy 0, policy_version 42535 (0.0024) [2024-03-29 16:48:08,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42871.5, 300 sec: 42320.7). Total num frames: 696975360. Throughput: 0: 42057.8. Samples: 579123260. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 16:48:08,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 16:48:11,285][00497] Updated weights for policy 0, policy_version 42545 (0.0023) [2024-03-29 16:48:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42598.5, 300 sec: 42265.2). Total num frames: 697139200. Throughput: 0: 42116.9. Samples: 579396120. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 16:48:13,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 16:48:15,783][00497] Updated weights for policy 0, policy_version 42555 (0.0024) [2024-03-29 16:48:18,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 697368576. Throughput: 0: 42324.8. Samples: 579534540. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 16:48:18,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 16:48:19,067][00497] Updated weights for policy 0, policy_version 42565 (0.0032) [2024-03-29 16:48:22,089][00497] Updated weights for policy 0, policy_version 42575 (0.0028) [2024-03-29 16:48:23,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42598.4, 300 sec: 42265.1). Total num frames: 697597952. Throughput: 0: 41956.3. Samples: 579755220. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 16:48:23,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 16:48:26,796][00497] Updated weights for policy 0, policy_version 42585 (0.0021) [2024-03-29 16:48:28,213][00476] Signal inference workers to stop experience collection... (20600 times) [2024-03-29 16:48:28,214][00476] Signal inference workers to resume experience collection... (20600 times) [2024-03-29 16:48:28,240][00497] InferenceWorker_p0-w0: stopping experience collection (20600 times) [2024-03-29 16:48:28,240][00497] InferenceWorker_p0-w0: resuming experience collection (20600 times) [2024-03-29 16:48:28,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 697794560. Throughput: 0: 41960.9. Samples: 580028880. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 16:48:28,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 16:48:31,310][00497] Updated weights for policy 0, policy_version 42595 (0.0024) [2024-03-29 16:48:33,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.1, 300 sec: 42320.7). Total num frames: 697991168. Throughput: 0: 42109.8. Samples: 580168140. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 16:48:33,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:48:34,598][00497] Updated weights for policy 0, policy_version 42605 (0.0023) [2024-03-29 16:48:37,903][00497] Updated weights for policy 0, policy_version 42615 (0.0029) [2024-03-29 16:48:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 698236928. Throughput: 0: 41974.2. Samples: 580381800. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 16:48:38,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 16:48:42,333][00497] Updated weights for policy 0, policy_version 42625 (0.0035) [2024-03-29 16:48:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 698417152. Throughput: 0: 41931.1. Samples: 580658560. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 16:48:43,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 16:48:46,965][00497] Updated weights for policy 0, policy_version 42635 (0.0020) [2024-03-29 16:48:48,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.2, 300 sec: 42265.2). Total num frames: 698613760. Throughput: 0: 42150.7. Samples: 580797020. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 16:48:48,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 16:48:50,051][00497] Updated weights for policy 0, policy_version 42645 (0.0021) [2024-03-29 16:48:53,199][00497] Updated weights for policy 0, policy_version 42655 (0.0030) [2024-03-29 16:48:53,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 698875904. Throughput: 0: 42372.4. Samples: 581030020. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 16:48:53,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 16:48:57,666][00497] Updated weights for policy 0, policy_version 42665 (0.0025) [2024-03-29 16:48:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 699056128. Throughput: 0: 42148.0. Samples: 581292780. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 16:48:58,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:49:02,250][00497] Updated weights for policy 0, policy_version 42675 (0.0031) [2024-03-29 16:49:03,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 699252736. Throughput: 0: 42199.9. Samples: 581433540. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 16:49:03,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 16:49:03,912][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042680_699269120.pth... [2024-03-29 16:49:04,247][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042061_689127424.pth [2024-03-29 16:49:05,658][00497] Updated weights for policy 0, policy_version 42685 (0.0029) [2024-03-29 16:49:07,006][00476] Signal inference workers to stop experience collection... (20650 times) [2024-03-29 16:49:07,006][00476] Signal inference workers to resume experience collection... (20650 times) [2024-03-29 16:49:07,043][00497] InferenceWorker_p0-w0: stopping experience collection (20650 times) [2024-03-29 16:49:07,044][00497] InferenceWorker_p0-w0: resuming experience collection (20650 times) [2024-03-29 16:49:08,747][00497] Updated weights for policy 0, policy_version 42695 (0.0017) [2024-03-29 16:49:08,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 699514880. Throughput: 0: 42513.4. Samples: 581668320. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 16:49:08,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 16:49:13,389][00497] Updated weights for policy 0, policy_version 42705 (0.0018) [2024-03-29 16:49:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 699695104. Throughput: 0: 42058.6. Samples: 581921520. Policy #0 lag: (min: 1.0, avg: 24.1, max: 42.0) [2024-03-29 16:49:13,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 16:49:17,714][00497] Updated weights for policy 0, policy_version 42715 (0.0022) [2024-03-29 16:49:18,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 699875328. Throughput: 0: 41993.9. Samples: 582057860. Policy #0 lag: (min: 1.0, avg: 24.1, max: 42.0) [2024-03-29 16:49:18,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:49:21,333][00497] Updated weights for policy 0, policy_version 42725 (0.0032) [2024-03-29 16:49:23,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 700137472. Throughput: 0: 42737.4. Samples: 582304980. Policy #0 lag: (min: 1.0, avg: 24.1, max: 42.0) [2024-03-29 16:49:23,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 16:49:24,224][00497] Updated weights for policy 0, policy_version 42735 (0.0036) [2024-03-29 16:49:28,745][00497] Updated weights for policy 0, policy_version 42745 (0.0035) [2024-03-29 16:49:28,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 700334080. Throughput: 0: 42287.6. Samples: 582561500. Policy #0 lag: (min: 1.0, avg: 24.1, max: 42.0) [2024-03-29 16:49:28,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:49:33,157][00497] Updated weights for policy 0, policy_version 42755 (0.0017) [2024-03-29 16:49:33,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 700514304. Throughput: 0: 42145.7. Samples: 582693580. Policy #0 lag: (min: 1.0, avg: 24.1, max: 42.0) [2024-03-29 16:49:33,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 16:49:36,500][00497] Updated weights for policy 0, policy_version 42765 (0.0025) [2024-03-29 16:49:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 700776448. Throughput: 0: 42612.5. Samples: 582947580. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:49:38,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 16:49:39,620][00497] Updated weights for policy 0, policy_version 42775 (0.0029) [2024-03-29 16:49:42,654][00476] Signal inference workers to stop experience collection... (20700 times) [2024-03-29 16:49:42,720][00497] InferenceWorker_p0-w0: stopping experience collection (20700 times) [2024-03-29 16:49:42,730][00476] Signal inference workers to resume experience collection... (20700 times) [2024-03-29 16:49:42,746][00497] InferenceWorker_p0-w0: resuming experience collection (20700 times) [2024-03-29 16:49:43,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 700973056. Throughput: 0: 42289.6. Samples: 583195820. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:49:43,842][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 16:49:44,098][00497] Updated weights for policy 0, policy_version 42785 (0.0025) [2024-03-29 16:49:48,610][00497] Updated weights for policy 0, policy_version 42795 (0.0022) [2024-03-29 16:49:48,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 701153280. Throughput: 0: 42270.7. Samples: 583335720. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:49:48,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:49:51,714][00497] Updated weights for policy 0, policy_version 42805 (0.0022) [2024-03-29 16:49:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 701415424. Throughput: 0: 42653.3. Samples: 583587720. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:49:53,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 16:49:54,843][00497] Updated weights for policy 0, policy_version 42815 (0.0033) [2024-03-29 16:49:58,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 701612032. Throughput: 0: 42439.7. Samples: 583831300. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:49:58,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 16:49:59,443][00497] Updated weights for policy 0, policy_version 42825 (0.0017) [2024-03-29 16:50:03,839][00126] Fps is (10 sec: 37683.6, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 701792256. Throughput: 0: 42505.3. Samples: 583970600. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:50:03,840][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 16:50:04,067][00497] Updated weights for policy 0, policy_version 42835 (0.0023) [2024-03-29 16:50:07,386][00497] Updated weights for policy 0, policy_version 42845 (0.0023) [2024-03-29 16:50:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42376.2). Total num frames: 702038016. Throughput: 0: 42703.9. Samples: 584226660. Policy #0 lag: (min: 1.0, avg: 17.9, max: 41.0) [2024-03-29 16:50:08,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 16:50:10,551][00497] Updated weights for policy 0, policy_version 42855 (0.0024) [2024-03-29 16:50:13,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 702251008. Throughput: 0: 42150.6. Samples: 584458280. Policy #0 lag: (min: 1.0, avg: 17.9, max: 41.0) [2024-03-29 16:50:13,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:50:15,170][00497] Updated weights for policy 0, policy_version 42865 (0.0027) [2024-03-29 16:50:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 702431232. Throughput: 0: 42310.2. Samples: 584597540. Policy #0 lag: (min: 1.0, avg: 17.9, max: 41.0) [2024-03-29 16:50:18,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 16:50:19,469][00497] Updated weights for policy 0, policy_version 42875 (0.0019) [2024-03-29 16:50:20,629][00476] Signal inference workers to stop experience collection... (20750 times) [2024-03-29 16:50:20,703][00476] Signal inference workers to resume experience collection... (20750 times) [2024-03-29 16:50:20,704][00497] InferenceWorker_p0-w0: stopping experience collection (20750 times) [2024-03-29 16:50:20,730][00497] InferenceWorker_p0-w0: resuming experience collection (20750 times) [2024-03-29 16:50:22,977][00497] Updated weights for policy 0, policy_version 42885 (0.0025) [2024-03-29 16:50:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 702676992. Throughput: 0: 42711.5. Samples: 584869600. Policy #0 lag: (min: 1.0, avg: 17.9, max: 41.0) [2024-03-29 16:50:23,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 16:50:26,105][00497] Updated weights for policy 0, policy_version 42895 (0.0020) [2024-03-29 16:50:28,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 702889984. Throughput: 0: 42185.9. Samples: 585094180. Policy #0 lag: (min: 1.0, avg: 17.9, max: 41.0) [2024-03-29 16:50:28,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:50:30,625][00497] Updated weights for policy 0, policy_version 42905 (0.0023) [2024-03-29 16:50:33,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 703070208. Throughput: 0: 42156.0. Samples: 585232740. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 16:50:33,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 16:50:35,126][00497] Updated weights for policy 0, policy_version 42915 (0.0022) [2024-03-29 16:50:38,660][00497] Updated weights for policy 0, policy_version 42925 (0.0023) [2024-03-29 16:50:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 703283200. Throughput: 0: 42259.6. Samples: 585489400. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 16:50:38,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 16:50:41,881][00497] Updated weights for policy 0, policy_version 42935 (0.0024) [2024-03-29 16:50:43,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 703512576. Throughput: 0: 41855.1. Samples: 585714780. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 16:50:43,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 16:50:46,432][00497] Updated weights for policy 0, policy_version 42945 (0.0023) [2024-03-29 16:50:48,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 703709184. Throughput: 0: 41806.7. Samples: 585851900. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 16:50:48,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 16:50:50,723][00497] Updated weights for policy 0, policy_version 42955 (0.0018) [2024-03-29 16:50:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 42265.2). Total num frames: 703905792. Throughput: 0: 42268.0. Samples: 586128720. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 16:50:53,840][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 16:50:54,284][00497] Updated weights for policy 0, policy_version 42965 (0.0027) [2024-03-29 16:50:54,317][00476] Signal inference workers to stop experience collection... (20800 times) [2024-03-29 16:50:54,337][00497] InferenceWorker_p0-w0: stopping experience collection (20800 times) [2024-03-29 16:50:54,531][00476] Signal inference workers to resume experience collection... (20800 times) [2024-03-29 16:50:54,532][00497] InferenceWorker_p0-w0: resuming experience collection (20800 times) [2024-03-29 16:50:57,309][00497] Updated weights for policy 0, policy_version 42975 (0.0025) [2024-03-29 16:50:58,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 704151552. Throughput: 0: 41854.2. Samples: 586341720. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 16:50:58,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 16:51:01,928][00497] Updated weights for policy 0, policy_version 42985 (0.0026) [2024-03-29 16:51:03,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 704348160. Throughput: 0: 41838.7. Samples: 586480280. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:51:03,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 16:51:04,164][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042991_704364544.pth... [2024-03-29 16:51:04,487][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042373_694239232.pth [2024-03-29 16:51:06,228][00497] Updated weights for policy 0, policy_version 42995 (0.0016) [2024-03-29 16:51:08,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 704528384. Throughput: 0: 41867.7. Samples: 586753640. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:51:08,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 16:51:09,904][00497] Updated weights for policy 0, policy_version 43005 (0.0025) [2024-03-29 16:51:12,972][00497] Updated weights for policy 0, policy_version 43015 (0.0032) [2024-03-29 16:51:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 704790528. Throughput: 0: 41731.6. Samples: 586972100. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:51:13,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 16:51:17,625][00497] Updated weights for policy 0, policy_version 43025 (0.0025) [2024-03-29 16:51:18,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 704970752. Throughput: 0: 41626.8. Samples: 587105940. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:51:18,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 16:51:21,621][00497] Updated weights for policy 0, policy_version 43035 (0.0024) [2024-03-29 16:51:23,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.2, 300 sec: 42265.2). Total num frames: 705167360. Throughput: 0: 42188.5. Samples: 587387880. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 16:51:23,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 16:51:25,341][00497] Updated weights for policy 0, policy_version 43045 (0.0028) [2024-03-29 16:51:28,566][00497] Updated weights for policy 0, policy_version 43055 (0.0022) [2024-03-29 16:51:28,781][00476] Signal inference workers to stop experience collection... (20850 times) [2024-03-29 16:51:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 705413120. Throughput: 0: 42160.0. Samples: 587611980. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 16:51:28,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:51:28,858][00497] InferenceWorker_p0-w0: stopping experience collection (20850 times) [2024-03-29 16:51:28,869][00476] Signal inference workers to resume experience collection... (20850 times) [2024-03-29 16:51:28,887][00497] InferenceWorker_p0-w0: resuming experience collection (20850 times) [2024-03-29 16:51:32,977][00497] Updated weights for policy 0, policy_version 43065 (0.0016) [2024-03-29 16:51:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 705609728. Throughput: 0: 42084.8. Samples: 587745720. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 16:51:33,840][00126] Avg episode reward: [(0, '0.636')] [2024-03-29 16:51:36,956][00497] Updated weights for policy 0, policy_version 43075 (0.0025) [2024-03-29 16:51:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 42265.2). Total num frames: 705806336. Throughput: 0: 42126.2. Samples: 588024400. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 16:51:38,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 16:51:40,851][00497] Updated weights for policy 0, policy_version 43085 (0.0026) [2024-03-29 16:51:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 706052096. Throughput: 0: 42253.7. Samples: 588243140. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 16:51:43,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 16:51:44,119][00497] Updated weights for policy 0, policy_version 43095 (0.0030) [2024-03-29 16:51:48,755][00497] Updated weights for policy 0, policy_version 43105 (0.0023) [2024-03-29 16:51:48,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 706232320. Throughput: 0: 42018.2. Samples: 588371100. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 16:51:48,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:51:53,090][00497] Updated weights for policy 0, policy_version 43115 (0.0019) [2024-03-29 16:51:53,839][00126] Fps is (10 sec: 36045.0, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 706412544. Throughput: 0: 41888.0. Samples: 588638600. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 16:51:53,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:51:56,546][00497] Updated weights for policy 0, policy_version 43125 (0.0028) [2024-03-29 16:51:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 706658304. Throughput: 0: 42385.4. Samples: 588879440. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 16:51:58,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 16:51:59,793][00497] Updated weights for policy 0, policy_version 43135 (0.0024) [2024-03-29 16:52:03,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 706854912. Throughput: 0: 41831.4. Samples: 588988360. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 16:52:03,841][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:52:04,295][00497] Updated weights for policy 0, policy_version 43145 (0.0028) [2024-03-29 16:52:06,698][00476] Signal inference workers to stop experience collection... (20900 times) [2024-03-29 16:52:06,699][00476] Signal inference workers to resume experience collection... (20900 times) [2024-03-29 16:52:06,723][00497] InferenceWorker_p0-w0: stopping experience collection (20900 times) [2024-03-29 16:52:06,743][00497] InferenceWorker_p0-w0: resuming experience collection (20900 times) [2024-03-29 16:52:08,661][00497] Updated weights for policy 0, policy_version 43155 (0.0022) [2024-03-29 16:52:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 707051520. Throughput: 0: 41559.6. Samples: 589258060. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 16:52:08,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 16:52:12,136][00497] Updated weights for policy 0, policy_version 43165 (0.0020) [2024-03-29 16:52:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 707280896. Throughput: 0: 42122.1. Samples: 589507480. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 16:52:13,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 16:52:15,586][00497] Updated weights for policy 0, policy_version 43175 (0.0021) [2024-03-29 16:52:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 707493888. Throughput: 0: 41725.4. Samples: 589623360. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 16:52:18,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 16:52:19,851][00497] Updated weights for policy 0, policy_version 43185 (0.0017) [2024-03-29 16:52:23,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 707690496. Throughput: 0: 41553.7. Samples: 589894320. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 16:52:23,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:52:24,214][00497] Updated weights for policy 0, policy_version 43195 (0.0028) [2024-03-29 16:52:27,725][00497] Updated weights for policy 0, policy_version 43205 (0.0026) [2024-03-29 16:52:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 707919872. Throughput: 0: 42030.7. Samples: 590134520. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 16:52:28,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 16:52:31,346][00497] Updated weights for policy 0, policy_version 43215 (0.0024) [2024-03-29 16:52:33,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 708132864. Throughput: 0: 41661.8. Samples: 590245880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 16:52:33,841][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 16:52:35,698][00497] Updated weights for policy 0, policy_version 43225 (0.0024) [2024-03-29 16:52:38,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 708313088. Throughput: 0: 41646.3. Samples: 590512680. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 16:52:38,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 16:52:39,972][00497] Updated weights for policy 0, policy_version 43235 (0.0034) [2024-03-29 16:52:42,084][00476] Signal inference workers to stop experience collection... (20950 times) [2024-03-29 16:52:42,124][00497] InferenceWorker_p0-w0: stopping experience collection (20950 times) [2024-03-29 16:52:42,309][00476] Signal inference workers to resume experience collection... (20950 times) [2024-03-29 16:52:42,310][00497] InferenceWorker_p0-w0: resuming experience collection (20950 times) [2024-03-29 16:52:43,474][00497] Updated weights for policy 0, policy_version 43245 (0.0025) [2024-03-29 16:52:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 708542464. Throughput: 0: 42064.9. Samples: 590772360. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 16:52:43,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:52:46,789][00497] Updated weights for policy 0, policy_version 43255 (0.0023) [2024-03-29 16:52:48,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 708755456. Throughput: 0: 42057.8. Samples: 590880960. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 16:52:48,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 16:52:51,319][00497] Updated weights for policy 0, policy_version 43265 (0.0022) [2024-03-29 16:52:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 708935680. Throughput: 0: 41920.9. Samples: 591144500. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 16:52:53,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 16:52:55,948][00497] Updated weights for policy 0, policy_version 43275 (0.0029) [2024-03-29 16:52:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 709148672. Throughput: 0: 42121.9. Samples: 591402960. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 16:52:58,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:52:59,317][00497] Updated weights for policy 0, policy_version 43285 (0.0019) [2024-03-29 16:53:02,656][00497] Updated weights for policy 0, policy_version 43295 (0.0025) [2024-03-29 16:53:03,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 709394432. Throughput: 0: 41915.9. Samples: 591509580. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 16:53:03,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 16:53:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000043298_709394432.pth... [2024-03-29 16:53:04,186][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042680_699269120.pth [2024-03-29 16:53:07,225][00497] Updated weights for policy 0, policy_version 43305 (0.0022) [2024-03-29 16:53:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 709558272. Throughput: 0: 41171.3. Samples: 591747020. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 16:53:08,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:53:11,828][00497] Updated weights for policy 0, policy_version 43315 (0.0023) [2024-03-29 16:53:13,532][00476] Signal inference workers to stop experience collection... (21000 times) [2024-03-29 16:53:13,569][00497] InferenceWorker_p0-w0: stopping experience collection (21000 times) [2024-03-29 16:53:13,723][00476] Signal inference workers to resume experience collection... (21000 times) [2024-03-29 16:53:13,724][00497] InferenceWorker_p0-w0: resuming experience collection (21000 times) [2024-03-29 16:53:13,839][00126] Fps is (10 sec: 34406.7, 60 sec: 40960.1, 300 sec: 41931.9). Total num frames: 709738496. Throughput: 0: 41821.8. Samples: 592016500. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 16:53:13,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 16:53:15,404][00497] Updated weights for policy 0, policy_version 43325 (0.0025) [2024-03-29 16:53:18,774][00497] Updated weights for policy 0, policy_version 43335 (0.0019) [2024-03-29 16:53:18,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 710000640. Throughput: 0: 41446.1. Samples: 592110960. Policy #0 lag: (min: 0.0, avg: 21.6, max: 44.0) [2024-03-29 16:53:18,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 16:53:23,185][00497] Updated weights for policy 0, policy_version 43345 (0.0022) [2024-03-29 16:53:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 710180864. Throughput: 0: 41205.3. Samples: 592366920. Policy #0 lag: (min: 0.0, avg: 21.6, max: 44.0) [2024-03-29 16:53:23,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 16:53:27,488][00497] Updated weights for policy 0, policy_version 43355 (0.0027) [2024-03-29 16:53:28,839][00126] Fps is (10 sec: 36044.8, 60 sec: 40686.9, 300 sec: 41931.9). Total num frames: 710361088. Throughput: 0: 41553.3. Samples: 592642260. Policy #0 lag: (min: 0.0, avg: 21.6, max: 44.0) [2024-03-29 16:53:28,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 16:53:30,987][00497] Updated weights for policy 0, policy_version 43365 (0.0020) [2024-03-29 16:53:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 710623232. Throughput: 0: 41528.0. Samples: 592749720. Policy #0 lag: (min: 0.0, avg: 21.6, max: 44.0) [2024-03-29 16:53:33,840][00126] Avg episode reward: [(0, '0.662')] [2024-03-29 16:53:34,021][00476] Saving new best policy, reward=0.662! [2024-03-29 16:53:34,613][00497] Updated weights for policy 0, policy_version 43375 (0.0029) [2024-03-29 16:53:38,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 710803456. Throughput: 0: 40964.5. Samples: 592987900. Policy #0 lag: (min: 0.0, avg: 21.6, max: 44.0) [2024-03-29 16:53:38,841][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 16:53:39,069][00497] Updated weights for policy 0, policy_version 43385 (0.0025) [2024-03-29 16:53:43,398][00497] Updated weights for policy 0, policy_version 43395 (0.0021) [2024-03-29 16:53:43,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40686.9, 300 sec: 41931.9). Total num frames: 710983680. Throughput: 0: 41146.2. Samples: 593254540. Policy #0 lag: (min: 0.0, avg: 21.6, max: 44.0) [2024-03-29 16:53:43,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:53:45,381][00476] Signal inference workers to stop experience collection... (21050 times) [2024-03-29 16:53:45,419][00497] InferenceWorker_p0-w0: stopping experience collection (21050 times) [2024-03-29 16:53:45,574][00476] Signal inference workers to resume experience collection... (21050 times) [2024-03-29 16:53:45,575][00497] InferenceWorker_p0-w0: resuming experience collection (21050 times) [2024-03-29 16:53:46,706][00497] Updated weights for policy 0, policy_version 43405 (0.0020) [2024-03-29 16:53:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 711229440. Throughput: 0: 41607.2. Samples: 593381900. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 16:53:48,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 16:53:50,142][00497] Updated weights for policy 0, policy_version 43415 (0.0031) [2024-03-29 16:53:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 711426048. Throughput: 0: 41482.6. Samples: 593613740. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 16:53:53,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 16:53:55,017][00497] Updated weights for policy 0, policy_version 43425 (0.0024) [2024-03-29 16:53:58,841][00126] Fps is (10 sec: 39315.7, 60 sec: 41232.0, 300 sec: 41931.7). Total num frames: 711622656. Throughput: 0: 41347.1. Samples: 593877180. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 16:53:58,842][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 16:53:58,993][00497] Updated weights for policy 0, policy_version 43435 (0.0020) [2024-03-29 16:54:02,653][00497] Updated weights for policy 0, policy_version 43445 (0.0021) [2024-03-29 16:54:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 40686.9, 300 sec: 41765.3). Total num frames: 711835648. Throughput: 0: 42215.1. Samples: 594010640. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 16:54:03,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 16:54:06,286][00497] Updated weights for policy 0, policy_version 43455 (0.0030) [2024-03-29 16:54:08,839][00126] Fps is (10 sec: 44243.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 712065024. Throughput: 0: 41016.9. Samples: 594212680. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 16:54:08,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 16:54:11,258][00497] Updated weights for policy 0, policy_version 43465 (0.0022) [2024-03-29 16:54:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 712245248. Throughput: 0: 40646.7. Samples: 594471360. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 16:54:13,842][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 16:54:15,092][00497] Updated weights for policy 0, policy_version 43475 (0.0023) [2024-03-29 16:54:18,278][00476] Signal inference workers to stop experience collection... (21100 times) [2024-03-29 16:54:18,312][00497] InferenceWorker_p0-w0: stopping experience collection (21100 times) [2024-03-29 16:54:18,501][00476] Signal inference workers to resume experience collection... (21100 times) [2024-03-29 16:54:18,502][00497] InferenceWorker_p0-w0: resuming experience collection (21100 times) [2024-03-29 16:54:18,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40687.0, 300 sec: 41709.8). Total num frames: 712441856. Throughput: 0: 41408.4. Samples: 594613100. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 16:54:18,840][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 16:54:19,003][00497] Updated weights for policy 0, policy_version 43485 (0.0029) [2024-03-29 16:54:22,558][00497] Updated weights for policy 0, policy_version 43495 (0.0024) [2024-03-29 16:54:23,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 712654848. Throughput: 0: 40851.1. Samples: 594826200. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 16:54:23,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 16:54:27,202][00497] Updated weights for policy 0, policy_version 43505 (0.0025) [2024-03-29 16:54:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 712851456. Throughput: 0: 40817.3. Samples: 595091320. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 16:54:28,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 16:54:30,828][00497] Updated weights for policy 0, policy_version 43515 (0.0023) [2024-03-29 16:54:33,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40413.8, 300 sec: 41598.7). Total num frames: 713048064. Throughput: 0: 40716.8. Samples: 595214160. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 16:54:33,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 16:54:34,881][00497] Updated weights for policy 0, policy_version 43525 (0.0030) [2024-03-29 16:54:38,399][00497] Updated weights for policy 0, policy_version 43535 (0.0025) [2024-03-29 16:54:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 713293824. Throughput: 0: 40895.5. Samples: 595454040. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 16:54:38,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 16:54:43,291][00497] Updated weights for policy 0, policy_version 43545 (0.0021) [2024-03-29 16:54:43,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 713441280. Throughput: 0: 40322.6. Samples: 595691640. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 16:54:43,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 16:54:46,873][00497] Updated weights for policy 0, policy_version 43555 (0.0016) [2024-03-29 16:54:48,839][00126] Fps is (10 sec: 32768.2, 60 sec: 39867.7, 300 sec: 41376.6). Total num frames: 713621504. Throughput: 0: 40112.1. Samples: 595815680. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 16:54:48,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 16:54:50,790][00476] Signal inference workers to stop experience collection... (21150 times) [2024-03-29 16:54:50,850][00497] InferenceWorker_p0-w0: stopping experience collection (21150 times) [2024-03-29 16:54:50,874][00476] Signal inference workers to resume experience collection... (21150 times) [2024-03-29 16:54:50,881][00497] InferenceWorker_p0-w0: resuming experience collection (21150 times) [2024-03-29 16:54:51,191][00497] Updated weights for policy 0, policy_version 43565 (0.0024) [2024-03-29 16:54:53,840][00126] Fps is (10 sec: 44231.8, 60 sec: 40959.2, 300 sec: 41598.5). Total num frames: 713883648. Throughput: 0: 40854.0. Samples: 596051160. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 16:54:53,841][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 16:54:54,745][00497] Updated weights for policy 0, policy_version 43575 (0.0024) [2024-03-29 16:54:58,839][00126] Fps is (10 sec: 45875.2, 60 sec: 40961.0, 300 sec: 41654.2). Total num frames: 714080256. Throughput: 0: 40579.2. Samples: 596297420. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 16:54:58,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 16:54:58,968][00497] Updated weights for policy 0, policy_version 43585 (0.0017) [2024-03-29 16:55:03,047][00497] Updated weights for policy 0, policy_version 43595 (0.0035) [2024-03-29 16:55:03,839][00126] Fps is (10 sec: 37687.5, 60 sec: 40413.9, 300 sec: 41432.1). Total num frames: 714260480. Throughput: 0: 40446.1. Samples: 596433180. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 16:55:03,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 16:55:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000043595_714260480.pth... [2024-03-29 16:55:04,168][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000042991_704364544.pth [2024-03-29 16:55:07,531][00497] Updated weights for policy 0, policy_version 43605 (0.0021) [2024-03-29 16:55:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40140.8, 300 sec: 41432.1). Total num frames: 714473472. Throughput: 0: 40876.4. Samples: 596665640. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:55:08,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 16:55:11,292][00497] Updated weights for policy 0, policy_version 43615 (0.0018) [2024-03-29 16:55:13,841][00126] Fps is (10 sec: 40955.0, 60 sec: 40413.1, 300 sec: 41487.4). Total num frames: 714670080. Throughput: 0: 40026.9. Samples: 596892580. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:55:13,841][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 16:55:15,753][00497] Updated weights for policy 0, policy_version 43625 (0.0023) [2024-03-29 16:55:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 40686.9, 300 sec: 41376.5). Total num frames: 714883072. Throughput: 0: 40361.3. Samples: 597030420. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:55:18,842][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 16:55:19,336][00497] Updated weights for policy 0, policy_version 43635 (0.0018) [2024-03-29 16:55:23,839][00126] Fps is (10 sec: 39326.7, 60 sec: 40140.8, 300 sec: 41265.5). Total num frames: 715063296. Throughput: 0: 40447.2. Samples: 597274160. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:55:23,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 16:55:23,903][00497] Updated weights for policy 0, policy_version 43645 (0.0018) [2024-03-29 16:55:24,740][00476] Signal inference workers to stop experience collection... (21200 times) [2024-03-29 16:55:24,776][00497] InferenceWorker_p0-w0: stopping experience collection (21200 times) [2024-03-29 16:55:24,956][00476] Signal inference workers to resume experience collection... (21200 times) [2024-03-29 16:55:24,957][00497] InferenceWorker_p0-w0: resuming experience collection (21200 times) [2024-03-29 16:55:27,623][00497] Updated weights for policy 0, policy_version 43655 (0.0035) [2024-03-29 16:55:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 715309056. Throughput: 0: 40408.0. Samples: 597510000. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:55:28,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 16:55:31,628][00497] Updated weights for policy 0, policy_version 43665 (0.0021) [2024-03-29 16:55:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40413.9, 300 sec: 41321.0). Total num frames: 715472896. Throughput: 0: 40599.5. Samples: 597642660. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 16:55:33,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 16:55:35,214][00497] Updated weights for policy 0, policy_version 43675 (0.0021) [2024-03-29 16:55:38,839][00126] Fps is (10 sec: 37683.5, 60 sec: 39867.8, 300 sec: 41265.5). Total num frames: 715685888. Throughput: 0: 40755.3. Samples: 597885100. Policy #0 lag: (min: 0.0, avg: 21.0, max: 44.0) [2024-03-29 16:55:38,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 16:55:39,749][00497] Updated weights for policy 0, policy_version 43685 (0.0032) [2024-03-29 16:55:43,464][00497] Updated weights for policy 0, policy_version 43695 (0.0020) [2024-03-29 16:55:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41233.1, 300 sec: 41376.5). Total num frames: 715915264. Throughput: 0: 40576.0. Samples: 598123340. Policy #0 lag: (min: 0.0, avg: 21.0, max: 44.0) [2024-03-29 16:55:43,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 16:55:47,895][00497] Updated weights for policy 0, policy_version 43705 (0.0022) [2024-03-29 16:55:48,839][00126] Fps is (10 sec: 39321.1, 60 sec: 40959.9, 300 sec: 41265.5). Total num frames: 716079104. Throughput: 0: 40465.7. Samples: 598254140. Policy #0 lag: (min: 0.0, avg: 21.0, max: 44.0) [2024-03-29 16:55:48,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:55:51,300][00497] Updated weights for policy 0, policy_version 43715 (0.0028) [2024-03-29 16:55:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40414.7, 300 sec: 41209.9). Total num frames: 716308480. Throughput: 0: 40730.7. Samples: 598498520. Policy #0 lag: (min: 0.0, avg: 21.0, max: 44.0) [2024-03-29 16:55:53,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 16:55:55,601][00497] Updated weights for policy 0, policy_version 43725 (0.0019) [2024-03-29 16:55:58,839][00126] Fps is (10 sec: 44237.3, 60 sec: 40686.9, 300 sec: 41265.5). Total num frames: 716521472. Throughput: 0: 41018.1. Samples: 598738340. Policy #0 lag: (min: 0.0, avg: 21.0, max: 44.0) [2024-03-29 16:55:58,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 16:55:59,268][00497] Updated weights for policy 0, policy_version 43735 (0.0022) [2024-03-29 16:56:01,442][00476] Signal inference workers to stop experience collection... (21250 times) [2024-03-29 16:56:01,443][00476] Signal inference workers to resume experience collection... (21250 times) [2024-03-29 16:56:01,469][00497] InferenceWorker_p0-w0: stopping experience collection (21250 times) [2024-03-29 16:56:01,490][00497] InferenceWorker_p0-w0: resuming experience collection (21250 times) [2024-03-29 16:56:03,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40686.9, 300 sec: 41265.5). Total num frames: 716701696. Throughput: 0: 40989.3. Samples: 598874940. Policy #0 lag: (min: 0.0, avg: 21.0, max: 44.0) [2024-03-29 16:56:03,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 16:56:03,920][00497] Updated weights for policy 0, policy_version 43745 (0.0022) [2024-03-29 16:56:07,526][00497] Updated weights for policy 0, policy_version 43755 (0.0031) [2024-03-29 16:56:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41154.4). Total num frames: 716931072. Throughput: 0: 40965.8. Samples: 599117620. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:56:08,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 16:56:11,776][00497] Updated weights for policy 0, policy_version 43765 (0.0024) [2024-03-29 16:56:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41233.9, 300 sec: 41265.4). Total num frames: 717144064. Throughput: 0: 40850.2. Samples: 599348260. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:56:13,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 16:56:15,620][00497] Updated weights for policy 0, policy_version 43775 (0.0017) [2024-03-29 16:56:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40960.0, 300 sec: 41265.5). Total num frames: 717340672. Throughput: 0: 40788.0. Samples: 599478120. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:56:18,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 16:56:19,808][00497] Updated weights for policy 0, policy_version 43785 (0.0024) [2024-03-29 16:56:23,697][00497] Updated weights for policy 0, policy_version 43795 (0.0021) [2024-03-29 16:56:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41098.8). Total num frames: 717537280. Throughput: 0: 40966.6. Samples: 599728600. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:56:23,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 16:56:27,966][00497] Updated weights for policy 0, policy_version 43805 (0.0024) [2024-03-29 16:56:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40687.0, 300 sec: 41154.4). Total num frames: 717750272. Throughput: 0: 41043.6. Samples: 599970300. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 16:56:28,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 16:56:31,943][00497] Updated weights for policy 0, policy_version 43815 (0.0030) [2024-03-29 16:56:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41154.4). Total num frames: 717946880. Throughput: 0: 40686.3. Samples: 600085020. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:56:33,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 16:56:35,716][00476] Signal inference workers to stop experience collection... (21300 times) [2024-03-29 16:56:35,778][00497] InferenceWorker_p0-w0: stopping experience collection (21300 times) [2024-03-29 16:56:35,803][00476] Signal inference workers to resume experience collection... (21300 times) [2024-03-29 16:56:35,806][00497] InferenceWorker_p0-w0: resuming experience collection (21300 times) [2024-03-29 16:56:36,113][00497] Updated weights for policy 0, policy_version 43825 (0.0018) [2024-03-29 16:56:38,839][00126] Fps is (10 sec: 36044.6, 60 sec: 40413.8, 300 sec: 40876.7). Total num frames: 718110720. Throughput: 0: 41020.0. Samples: 600344420. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:56:38,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 16:56:39,990][00497] Updated weights for policy 0, policy_version 43835 (0.0028) [2024-03-29 16:56:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40413.9, 300 sec: 41043.3). Total num frames: 718340096. Throughput: 0: 40623.1. Samples: 600566380. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:56:43,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 16:56:44,139][00497] Updated weights for policy 0, policy_version 43845 (0.0020) [2024-03-29 16:56:48,361][00497] Updated weights for policy 0, policy_version 43855 (0.0034) [2024-03-29 16:56:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.1, 300 sec: 41098.8). Total num frames: 718536704. Throughput: 0: 39803.1. Samples: 600666080. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:56:48,841][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 16:56:52,356][00497] Updated weights for policy 0, policy_version 43865 (0.0018) [2024-03-29 16:56:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40140.8, 300 sec: 40876.7). Total num frames: 718716928. Throughput: 0: 40351.1. Samples: 600933420. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:56:53,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 16:56:56,705][00497] Updated weights for policy 0, policy_version 43875 (0.0029) [2024-03-29 16:56:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40413.8, 300 sec: 40987.8). Total num frames: 718946304. Throughput: 0: 40243.1. Samples: 601159200. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 16:56:58,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 16:57:00,615][00497] Updated weights for policy 0, policy_version 43885 (0.0024) [2024-03-29 16:57:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 40686.9, 300 sec: 40987.8). Total num frames: 719142912. Throughput: 0: 40098.2. Samples: 601282540. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 16:57:03,841][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 16:57:04,069][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000043894_719159296.pth... [2024-03-29 16:57:04,442][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000043298_709394432.pth [2024-03-29 16:57:04,758][00497] Updated weights for policy 0, policy_version 43895 (0.0017) [2024-03-29 16:57:08,499][00497] Updated weights for policy 0, policy_version 43905 (0.0022) [2024-03-29 16:57:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40140.8, 300 sec: 40876.7). Total num frames: 719339520. Throughput: 0: 40045.8. Samples: 601530660. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 16:57:08,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 16:57:11,788][00476] Signal inference workers to stop experience collection... (21350 times) [2024-03-29 16:57:11,813][00497] InferenceWorker_p0-w0: stopping experience collection (21350 times) [2024-03-29 16:57:12,004][00476] Signal inference workers to resume experience collection... (21350 times) [2024-03-29 16:57:12,004][00497] InferenceWorker_p0-w0: resuming experience collection (21350 times) [2024-03-29 16:57:12,845][00497] Updated weights for policy 0, policy_version 43915 (0.0034) [2024-03-29 16:57:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 40140.9, 300 sec: 40876.7). Total num frames: 719552512. Throughput: 0: 40029.8. Samples: 601771640. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 16:57:13,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 16:57:16,957][00497] Updated weights for policy 0, policy_version 43925 (0.0022) [2024-03-29 16:57:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 40413.8, 300 sec: 40932.2). Total num frames: 719765504. Throughput: 0: 40171.1. Samples: 601892720. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 16:57:18,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 16:57:21,036][00497] Updated weights for policy 0, policy_version 43935 (0.0022) [2024-03-29 16:57:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 39867.7, 300 sec: 40710.1). Total num frames: 719929344. Throughput: 0: 39409.3. Samples: 602117840. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 16:57:23,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 16:57:25,324][00497] Updated weights for policy 0, policy_version 43945 (0.0018) [2024-03-29 16:57:28,839][00126] Fps is (10 sec: 36045.1, 60 sec: 39594.7, 300 sec: 40654.5). Total num frames: 720125952. Throughput: 0: 40267.6. Samples: 602378420. Policy #0 lag: (min: 2.0, avg: 18.7, max: 43.0) [2024-03-29 16:57:28,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 16:57:29,414][00497] Updated weights for policy 0, policy_version 43955 (0.0030) [2024-03-29 16:57:33,312][00497] Updated weights for policy 0, policy_version 43965 (0.0032) [2024-03-29 16:57:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 39867.7, 300 sec: 40765.6). Total num frames: 720338944. Throughput: 0: 40344.4. Samples: 602481580. Policy #0 lag: (min: 2.0, avg: 18.7, max: 43.0) [2024-03-29 16:57:33,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 16:57:37,348][00497] Updated weights for policy 0, policy_version 43975 (0.0019) [2024-03-29 16:57:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 40686.9, 300 sec: 40710.1). Total num frames: 720551936. Throughput: 0: 39900.9. Samples: 602728960. Policy #0 lag: (min: 2.0, avg: 18.7, max: 43.0) [2024-03-29 16:57:38,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 16:57:41,881][00497] Updated weights for policy 0, policy_version 43985 (0.0017) [2024-03-29 16:57:43,839][00126] Fps is (10 sec: 36044.9, 60 sec: 39321.6, 300 sec: 40487.9). Total num frames: 720699392. Throughput: 0: 40582.3. Samples: 602985400. Policy #0 lag: (min: 2.0, avg: 18.7, max: 43.0) [2024-03-29 16:57:43,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 16:57:45,660][00497] Updated weights for policy 0, policy_version 43995 (0.0019) [2024-03-29 16:57:46,983][00476] Signal inference workers to stop experience collection... (21400 times) [2024-03-29 16:57:47,013][00497] InferenceWorker_p0-w0: stopping experience collection (21400 times) [2024-03-29 16:57:47,173][00476] Signal inference workers to resume experience collection... (21400 times) [2024-03-29 16:57:47,174][00497] InferenceWorker_p0-w0: resuming experience collection (21400 times) [2024-03-29 16:57:48,839][00126] Fps is (10 sec: 37683.1, 60 sec: 39867.7, 300 sec: 40654.5). Total num frames: 720928768. Throughput: 0: 39909.8. Samples: 603078480. Policy #0 lag: (min: 2.0, avg: 18.7, max: 43.0) [2024-03-29 16:57:48,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 16:57:49,852][00497] Updated weights for policy 0, policy_version 44005 (0.0021) [2024-03-29 16:57:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 40413.9, 300 sec: 40654.5). Total num frames: 721141760. Throughput: 0: 39623.6. Samples: 603313720. Policy #0 lag: (min: 2.0, avg: 18.7, max: 43.0) [2024-03-29 16:57:53,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 16:57:53,852][00497] Updated weights for policy 0, policy_version 44015 (0.0024) [2024-03-29 16:57:58,480][00497] Updated weights for policy 0, policy_version 44025 (0.0018) [2024-03-29 16:57:58,839][00126] Fps is (10 sec: 37683.4, 60 sec: 39321.6, 300 sec: 40376.8). Total num frames: 721305600. Throughput: 0: 40134.6. Samples: 603577700. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 16:57:58,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 16:58:02,192][00497] Updated weights for policy 0, policy_version 44035 (0.0022) [2024-03-29 16:58:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40140.8, 300 sec: 40654.5). Total num frames: 721551360. Throughput: 0: 40131.1. Samples: 603698620. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 16:58:03,842][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 16:58:06,318][00497] Updated weights for policy 0, policy_version 44045 (0.0026) [2024-03-29 16:58:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 40140.8, 300 sec: 40710.1). Total num frames: 721747968. Throughput: 0: 40122.2. Samples: 603923340. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 16:58:08,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 16:58:10,394][00497] Updated weights for policy 0, policy_version 44055 (0.0031) [2024-03-29 16:58:13,839][00126] Fps is (10 sec: 36044.6, 60 sec: 39321.5, 300 sec: 40376.8). Total num frames: 721911808. Throughput: 0: 39519.0. Samples: 604156780. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 16:58:13,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 16:58:15,142][00497] Updated weights for policy 0, policy_version 44065 (0.0025) [2024-03-29 16:58:18,822][00497] Updated weights for policy 0, policy_version 44075 (0.0031) [2024-03-29 16:58:18,839][00126] Fps is (10 sec: 37683.5, 60 sec: 39321.7, 300 sec: 40487.9). Total num frames: 722124800. Throughput: 0: 40167.7. Samples: 604289120. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 16:58:18,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 16:58:22,692][00497] Updated weights for policy 0, policy_version 44085 (0.0022) [2024-03-29 16:58:23,839][00126] Fps is (10 sec: 40960.4, 60 sec: 39867.8, 300 sec: 40543.5). Total num frames: 722321408. Throughput: 0: 39664.0. Samples: 604513840. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 16:58:23,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 16:58:24,540][00476] Signal inference workers to stop experience collection... (21450 times) [2024-03-29 16:58:24,540][00476] Signal inference workers to resume experience collection... (21450 times) [2024-03-29 16:58:24,583][00497] InferenceWorker_p0-w0: stopping experience collection (21450 times) [2024-03-29 16:58:24,583][00497] InferenceWorker_p0-w0: resuming experience collection (21450 times) [2024-03-29 16:58:26,764][00497] Updated weights for policy 0, policy_version 44095 (0.0017) [2024-03-29 16:58:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 40140.8, 300 sec: 40376.8). Total num frames: 722534400. Throughput: 0: 39161.8. Samples: 604747680. Policy #0 lag: (min: 1.0, avg: 22.1, max: 40.0) [2024-03-29 16:58:28,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 16:58:31,415][00497] Updated weights for policy 0, policy_version 44105 (0.0023) [2024-03-29 16:58:33,839][00126] Fps is (10 sec: 37682.9, 60 sec: 39321.6, 300 sec: 40321.3). Total num frames: 722698240. Throughput: 0: 40300.5. Samples: 604892000. Policy #0 lag: (min: 1.0, avg: 22.1, max: 40.0) [2024-03-29 16:58:33,840][00126] Avg episode reward: [(0, '0.476')] [2024-03-29 16:58:35,072][00497] Updated weights for policy 0, policy_version 44115 (0.0023) [2024-03-29 16:58:38,818][00497] Updated weights for policy 0, policy_version 44125 (0.0018) [2024-03-29 16:58:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 39867.7, 300 sec: 40543.5). Total num frames: 722944000. Throughput: 0: 40017.3. Samples: 605114500. Policy #0 lag: (min: 1.0, avg: 22.1, max: 40.0) [2024-03-29 16:58:38,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 16:58:42,957][00497] Updated weights for policy 0, policy_version 44135 (0.0019) [2024-03-29 16:58:43,839][00126] Fps is (10 sec: 44237.0, 60 sec: 40686.9, 300 sec: 40376.8). Total num frames: 723140608. Throughput: 0: 39668.9. Samples: 605362800. Policy #0 lag: (min: 1.0, avg: 22.1, max: 40.0) [2024-03-29 16:58:43,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 16:58:47,541][00497] Updated weights for policy 0, policy_version 44145 (0.0028) [2024-03-29 16:58:48,839][00126] Fps is (10 sec: 36045.0, 60 sec: 39594.7, 300 sec: 40265.8). Total num frames: 723304448. Throughput: 0: 39959.2. Samples: 605496780. Policy #0 lag: (min: 1.0, avg: 22.1, max: 40.0) [2024-03-29 16:58:48,840][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 16:58:51,231][00497] Updated weights for policy 0, policy_version 44155 (0.0026) [2024-03-29 16:58:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 39867.7, 300 sec: 40377.0). Total num frames: 723533824. Throughput: 0: 39975.9. Samples: 605722260. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 16:58:53,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 16:58:55,629][00497] Updated weights for policy 0, policy_version 44165 (0.0028) [2024-03-29 16:58:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 40413.8, 300 sec: 40321.3). Total num frames: 723730432. Throughput: 0: 40030.7. Samples: 605958160. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 16:58:58,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 16:58:59,417][00497] Updated weights for policy 0, policy_version 44175 (0.0026) [2024-03-29 16:59:00,856][00476] Signal inference workers to stop experience collection... (21500 times) [2024-03-29 16:59:00,897][00497] InferenceWorker_p0-w0: stopping experience collection (21500 times) [2024-03-29 16:59:01,066][00476] Signal inference workers to resume experience collection... (21500 times) [2024-03-29 16:59:01,066][00497] InferenceWorker_p0-w0: resuming experience collection (21500 times) [2024-03-29 16:59:03,839][00126] Fps is (10 sec: 37683.3, 60 sec: 39321.6, 300 sec: 40154.7). Total num frames: 723910656. Throughput: 0: 39813.2. Samples: 606080720. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 16:59:03,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 16:59:04,054][00497] Updated weights for policy 0, policy_version 44185 (0.0018) [2024-03-29 16:59:04,334][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000044186_723943424.pth... [2024-03-29 16:59:04,666][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000043595_714260480.pth [2024-03-29 16:59:07,789][00497] Updated weights for policy 0, policy_version 44195 (0.0024) [2024-03-29 16:59:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 39867.7, 300 sec: 40321.3). Total num frames: 724140032. Throughput: 0: 40313.7. Samples: 606327960. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 16:59:08,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 16:59:12,307][00497] Updated weights for policy 0, policy_version 44205 (0.0022) [2024-03-29 16:59:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40413.9, 300 sec: 40321.3). Total num frames: 724336640. Throughput: 0: 40237.7. Samples: 606558380. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 16:59:13,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 16:59:15,863][00497] Updated weights for policy 0, policy_version 44215 (0.0024) [2024-03-29 16:59:18,839][00126] Fps is (10 sec: 37683.3, 60 sec: 39867.7, 300 sec: 40210.2). Total num frames: 724516864. Throughput: 0: 39705.8. Samples: 606678760. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 16:59:18,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 16:59:20,248][00497] Updated weights for policy 0, policy_version 44225 (0.0024) [2024-03-29 16:59:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40140.8, 300 sec: 40265.8). Total num frames: 724729856. Throughput: 0: 40525.8. Samples: 606938160. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 16:59:23,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 16:59:23,874][00497] Updated weights for policy 0, policy_version 44235 (0.0024) [2024-03-29 16:59:28,319][00497] Updated weights for policy 0, policy_version 44245 (0.0019) [2024-03-29 16:59:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 40140.8, 300 sec: 40321.3). Total num frames: 724942848. Throughput: 0: 40209.8. Samples: 607172240. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 16:59:28,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 16:59:32,008][00497] Updated weights for policy 0, policy_version 44255 (0.0022) [2024-03-29 16:59:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 40960.0, 300 sec: 40210.2). Total num frames: 725155840. Throughput: 0: 39792.8. Samples: 607287460. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 16:59:33,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 16:59:35,769][00476] Signal inference workers to stop experience collection... (21550 times) [2024-03-29 16:59:35,796][00497] InferenceWorker_p0-w0: stopping experience collection (21550 times) [2024-03-29 16:59:35,967][00476] Signal inference workers to resume experience collection... (21550 times) [2024-03-29 16:59:35,968][00497] InferenceWorker_p0-w0: resuming experience collection (21550 times) [2024-03-29 16:59:36,681][00497] Updated weights for policy 0, policy_version 44265 (0.0024) [2024-03-29 16:59:38,839][00126] Fps is (10 sec: 39321.4, 60 sec: 39867.7, 300 sec: 40321.3). Total num frames: 725336064. Throughput: 0: 40339.6. Samples: 607537540. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 16:59:38,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 16:59:40,087][00497] Updated weights for policy 0, policy_version 44275 (0.0026) [2024-03-29 16:59:43,839][00126] Fps is (10 sec: 37683.4, 60 sec: 39867.8, 300 sec: 40376.8). Total num frames: 725532672. Throughput: 0: 40230.3. Samples: 607768520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 16:59:43,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 16:59:44,381][00497] Updated weights for policy 0, policy_version 44285 (0.0017) [2024-03-29 16:59:48,360][00497] Updated weights for policy 0, policy_version 44295 (0.0022) [2024-03-29 16:59:48,842][00126] Fps is (10 sec: 40950.3, 60 sec: 40685.3, 300 sec: 40210.1). Total num frames: 725745664. Throughput: 0: 40339.7. Samples: 607896100. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:59:48,842][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 16:59:52,912][00497] Updated weights for policy 0, policy_version 44305 (0.0023) [2024-03-29 16:59:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 40140.8, 300 sec: 40210.2). Total num frames: 725942272. Throughput: 0: 40550.6. Samples: 608152740. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:59:53,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 16:59:56,259][00497] Updated weights for policy 0, policy_version 44315 (0.0019) [2024-03-29 16:59:58,839][00126] Fps is (10 sec: 42608.6, 60 sec: 40687.0, 300 sec: 40376.8). Total num frames: 726171648. Throughput: 0: 40395.6. Samples: 608376180. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 16:59:58,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 17:00:00,488][00497] Updated weights for policy 0, policy_version 44325 (0.0027) [2024-03-29 17:00:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40686.9, 300 sec: 40265.8). Total num frames: 726351872. Throughput: 0: 40702.2. Samples: 608510360. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:00:03,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:00:04,820][00497] Updated weights for policy 0, policy_version 44335 (0.0018) [2024-03-29 17:00:08,670][00497] Updated weights for policy 0, policy_version 44345 (0.0021) [2024-03-29 17:00:08,839][00126] Fps is (10 sec: 37683.1, 60 sec: 40140.8, 300 sec: 40265.9). Total num frames: 726548480. Throughput: 0: 40300.9. Samples: 608751700. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:00:08,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 17:00:09,006][00476] Signal inference workers to stop experience collection... (21600 times) [2024-03-29 17:00:09,055][00497] InferenceWorker_p0-w0: stopping experience collection (21600 times) [2024-03-29 17:00:09,191][00476] Signal inference workers to resume experience collection... (21600 times) [2024-03-29 17:00:09,191][00497] InferenceWorker_p0-w0: resuming experience collection (21600 times) [2024-03-29 17:00:12,451][00497] Updated weights for policy 0, policy_version 44355 (0.0022) [2024-03-29 17:00:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40686.9, 300 sec: 40321.3). Total num frames: 726777856. Throughput: 0: 40491.0. Samples: 608994340. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:00:13,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:00:16,631][00497] Updated weights for policy 0, policy_version 44365 (0.0024) [2024-03-29 17:00:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40413.8, 300 sec: 40265.8). Total num frames: 726941696. Throughput: 0: 40450.2. Samples: 609107720. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 17:00:18,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 17:00:20,535][00497] Updated weights for policy 0, policy_version 44375 (0.0025) [2024-03-29 17:00:23,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40413.8, 300 sec: 40154.7). Total num frames: 727154688. Throughput: 0: 40261.8. Samples: 609349320. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 17:00:23,841][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 17:00:24,884][00497] Updated weights for policy 0, policy_version 44385 (0.0020) [2024-03-29 17:00:28,665][00497] Updated weights for policy 0, policy_version 44395 (0.0019) [2024-03-29 17:00:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 40413.8, 300 sec: 40321.3). Total num frames: 727367680. Throughput: 0: 40391.9. Samples: 609586160. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 17:00:28,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:00:33,172][00497] Updated weights for policy 0, policy_version 44405 (0.0019) [2024-03-29 17:00:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 39867.7, 300 sec: 40210.2). Total num frames: 727547904. Throughput: 0: 40249.7. Samples: 609707240. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 17:00:33,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:00:37,061][00497] Updated weights for policy 0, policy_version 44415 (0.0023) [2024-03-29 17:00:38,840][00126] Fps is (10 sec: 39318.7, 60 sec: 40413.3, 300 sec: 40154.6). Total num frames: 727760896. Throughput: 0: 39903.8. Samples: 609948440. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 17:00:38,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 17:00:41,127][00497] Updated weights for policy 0, policy_version 44425 (0.0017) [2024-03-29 17:00:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40413.9, 300 sec: 40265.8). Total num frames: 727957504. Throughput: 0: 40298.2. Samples: 610189600. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 17:00:43,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 17:00:44,862][00497] Updated weights for policy 0, policy_version 44435 (0.0020) [2024-03-29 17:00:48,839][00126] Fps is (10 sec: 40963.2, 60 sec: 40415.4, 300 sec: 40210.2). Total num frames: 728170496. Throughput: 0: 40005.3. Samples: 610310600. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 17:00:48,841][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:00:49,400][00497] Updated weights for policy 0, policy_version 44445 (0.0018) [2024-03-29 17:00:49,681][00476] Signal inference workers to stop experience collection... (21650 times) [2024-03-29 17:00:49,702][00497] InferenceWorker_p0-w0: stopping experience collection (21650 times) [2024-03-29 17:00:49,878][00476] Signal inference workers to resume experience collection... (21650 times) [2024-03-29 17:00:49,878][00497] InferenceWorker_p0-w0: resuming experience collection (21650 times) [2024-03-29 17:00:52,887][00497] Updated weights for policy 0, policy_version 44455 (0.0030) [2024-03-29 17:00:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 40413.9, 300 sec: 40154.7). Total num frames: 728367104. Throughput: 0: 40392.8. Samples: 610569380. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 17:00:53,840][00126] Avg episode reward: [(0, '0.651')] [2024-03-29 17:00:57,048][00497] Updated weights for policy 0, policy_version 44465 (0.0027) [2024-03-29 17:00:58,839][00126] Fps is (10 sec: 40959.8, 60 sec: 40140.7, 300 sec: 40265.8). Total num frames: 728580096. Throughput: 0: 40341.3. Samples: 610809700. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 17:00:58,842][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 17:01:00,737][00497] Updated weights for policy 0, policy_version 44475 (0.0023) [2024-03-29 17:01:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 40687.0, 300 sec: 40210.2). Total num frames: 728793088. Throughput: 0: 40361.8. Samples: 610924000. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 17:01:03,841][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 17:01:04,065][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000044483_728809472.pth... [2024-03-29 17:01:04,399][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000043894_719159296.pth [2024-03-29 17:01:05,365][00497] Updated weights for policy 0, policy_version 44485 (0.0030) [2024-03-29 17:01:08,839][00126] Fps is (10 sec: 40960.6, 60 sec: 40687.0, 300 sec: 40154.7). Total num frames: 728989696. Throughput: 0: 41235.2. Samples: 611204900. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 17:01:08,840][00126] Avg episode reward: [(0, '0.715')] [2024-03-29 17:01:08,862][00476] Saving new best policy, reward=0.715! [2024-03-29 17:01:08,873][00497] Updated weights for policy 0, policy_version 44495 (0.0030) [2024-03-29 17:01:12,849][00497] Updated weights for policy 0, policy_version 44505 (0.0029) [2024-03-29 17:01:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 40413.9, 300 sec: 40210.2). Total num frames: 729202688. Throughput: 0: 40893.8. Samples: 611426380. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 17:01:13,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 17:01:16,566][00497] Updated weights for policy 0, policy_version 44515 (0.0022) [2024-03-29 17:01:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 40265.8). Total num frames: 729415680. Throughput: 0: 41108.1. Samples: 611557100. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 17:01:18,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 17:01:21,190][00497] Updated weights for policy 0, policy_version 44525 (0.0020) [2024-03-29 17:01:23,684][00476] Signal inference workers to stop experience collection... (21700 times) [2024-03-29 17:01:23,702][00497] InferenceWorker_p0-w0: stopping experience collection (21700 times) [2024-03-29 17:01:23,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40960.0, 300 sec: 40210.2). Total num frames: 729612288. Throughput: 0: 41660.8. Samples: 611823140. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 17:01:23,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 17:01:23,897][00476] Signal inference workers to resume experience collection... (21700 times) [2024-03-29 17:01:23,898][00497] InferenceWorker_p0-w0: resuming experience collection (21700 times) [2024-03-29 17:01:24,528][00497] Updated weights for policy 0, policy_version 44535 (0.0022) [2024-03-29 17:01:28,370][00497] Updated weights for policy 0, policy_version 44545 (0.0027) [2024-03-29 17:01:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.1, 300 sec: 40321.3). Total num frames: 729841664. Throughput: 0: 41680.0. Samples: 612065200. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 17:01:28,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 17:01:31,945][00497] Updated weights for policy 0, policy_version 44555 (0.0036) [2024-03-29 17:01:33,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.1, 300 sec: 40487.9). Total num frames: 730054656. Throughput: 0: 41840.8. Samples: 612193440. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 17:01:33,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 17:01:36,687][00497] Updated weights for policy 0, policy_version 44565 (0.0024) [2024-03-29 17:01:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.7, 300 sec: 40321.3). Total num frames: 730234880. Throughput: 0: 41984.5. Samples: 612458680. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 17:01:38,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:01:39,924][00497] Updated weights for policy 0, policy_version 44575 (0.0019) [2024-03-29 17:01:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 40432.4). Total num frames: 730464256. Throughput: 0: 41948.9. Samples: 612697400. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 17:01:43,841][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 17:01:44,143][00497] Updated weights for policy 0, policy_version 44585 (0.0018) [2024-03-29 17:01:47,666][00497] Updated weights for policy 0, policy_version 44595 (0.0023) [2024-03-29 17:01:48,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 40599.0). Total num frames: 730693632. Throughput: 0: 42186.3. Samples: 612822380. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 17:01:48,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:01:52,066][00497] Updated weights for policy 0, policy_version 44605 (0.0022) [2024-03-29 17:01:53,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41506.2, 300 sec: 40376.9). Total num frames: 730857472. Throughput: 0: 41908.9. Samples: 613090800. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 17:01:53,840][00126] Avg episode reward: [(0, '0.654')] [2024-03-29 17:01:55,493][00497] Updated weights for policy 0, policy_version 44615 (0.0027) [2024-03-29 17:01:57,105][00476] Signal inference workers to stop experience collection... (21750 times) [2024-03-29 17:01:57,174][00497] InferenceWorker_p0-w0: stopping experience collection (21750 times) [2024-03-29 17:01:57,180][00476] Signal inference workers to resume experience collection... (21750 times) [2024-03-29 17:01:57,201][00497] InferenceWorker_p0-w0: resuming experience collection (21750 times) [2024-03-29 17:01:58,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.2, 300 sec: 40487.9). Total num frames: 731086848. Throughput: 0: 42220.4. Samples: 613326300. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 17:01:58,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:01:59,775][00497] Updated weights for policy 0, policy_version 44625 (0.0027) [2024-03-29 17:02:03,212][00497] Updated weights for policy 0, policy_version 44635 (0.0019) [2024-03-29 17:02:03,839][00126] Fps is (10 sec: 47513.1, 60 sec: 42325.3, 300 sec: 40654.5). Total num frames: 731332608. Throughput: 0: 42088.3. Samples: 613451080. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 17:02:03,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:02:07,781][00497] Updated weights for policy 0, policy_version 44645 (0.0019) [2024-03-29 17:02:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 40487.9). Total num frames: 731496448. Throughput: 0: 41843.9. Samples: 613706120. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 17:02:08,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 17:02:11,156][00497] Updated weights for policy 0, policy_version 44655 (0.0029) [2024-03-29 17:02:13,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 40543.4). Total num frames: 731725824. Throughput: 0: 42102.5. Samples: 613959820. Policy #0 lag: (min: 2.0, avg: 20.7, max: 44.0) [2024-03-29 17:02:13,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 17:02:15,268][00497] Updated weights for policy 0, policy_version 44665 (0.0018) [2024-03-29 17:02:18,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.3, 300 sec: 40765.6). Total num frames: 731955200. Throughput: 0: 42021.0. Samples: 614084380. Policy #0 lag: (min: 2.0, avg: 20.7, max: 44.0) [2024-03-29 17:02:18,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 17:02:18,842][00497] Updated weights for policy 0, policy_version 44675 (0.0024) [2024-03-29 17:02:23,392][00497] Updated weights for policy 0, policy_version 44685 (0.0024) [2024-03-29 17:02:23,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 40710.1). Total num frames: 732135424. Throughput: 0: 41915.1. Samples: 614344860. Policy #0 lag: (min: 2.0, avg: 20.7, max: 44.0) [2024-03-29 17:02:23,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 17:02:25,806][00476] Signal inference workers to stop experience collection... (21800 times) [2024-03-29 17:02:25,843][00497] InferenceWorker_p0-w0: stopping experience collection (21800 times) [2024-03-29 17:02:25,895][00476] Signal inference workers to resume experience collection... (21800 times) [2024-03-29 17:02:25,895][00497] InferenceWorker_p0-w0: resuming experience collection (21800 times) [2024-03-29 17:02:26,454][00497] Updated weights for policy 0, policy_version 44695 (0.0034) [2024-03-29 17:02:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 40765.6). Total num frames: 732364800. Throughput: 0: 42329.8. Samples: 614602240. Policy #0 lag: (min: 2.0, avg: 20.7, max: 44.0) [2024-03-29 17:02:28,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 17:02:30,462][00497] Updated weights for policy 0, policy_version 44705 (0.0022) [2024-03-29 17:02:33,829][00497] Updated weights for policy 0, policy_version 44715 (0.0020) [2024-03-29 17:02:33,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42598.5, 300 sec: 40876.7). Total num frames: 732610560. Throughput: 0: 42486.7. Samples: 614734280. Policy #0 lag: (min: 2.0, avg: 20.7, max: 44.0) [2024-03-29 17:02:33,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:02:38,342][00497] Updated weights for policy 0, policy_version 44725 (0.0026) [2024-03-29 17:02:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.4, 300 sec: 40987.8). Total num frames: 732790784. Throughput: 0: 42215.1. Samples: 614990480. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 17:02:38,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 17:02:41,686][00497] Updated weights for policy 0, policy_version 44735 (0.0020) [2024-03-29 17:02:43,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42325.3, 300 sec: 40932.2). Total num frames: 733003776. Throughput: 0: 42776.4. Samples: 615251240. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 17:02:43,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:02:45,795][00497] Updated weights for policy 0, policy_version 44745 (0.0019) [2024-03-29 17:02:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 40987.8). Total num frames: 733233152. Throughput: 0: 42825.7. Samples: 615378240. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 17:02:48,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:02:49,276][00497] Updated weights for policy 0, policy_version 44755 (0.0027) [2024-03-29 17:02:53,553][00497] Updated weights for policy 0, policy_version 44765 (0.0027) [2024-03-29 17:02:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42871.4, 300 sec: 41098.9). Total num frames: 733429760. Throughput: 0: 42745.4. Samples: 615629660. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 17:02:53,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 17:02:57,029][00497] Updated weights for policy 0, policy_version 44775 (0.0028) [2024-03-29 17:02:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42598.5, 300 sec: 40987.8). Total num frames: 733642752. Throughput: 0: 42804.1. Samples: 615886000. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 17:02:58,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 17:03:00,846][00476] Signal inference workers to stop experience collection... (21850 times) [2024-03-29 17:03:00,923][00497] InferenceWorker_p0-w0: stopping experience collection (21850 times) [2024-03-29 17:03:00,931][00476] Signal inference workers to resume experience collection... (21850 times) [2024-03-29 17:03:00,953][00497] InferenceWorker_p0-w0: resuming experience collection (21850 times) [2024-03-29 17:03:01,264][00497] Updated weights for policy 0, policy_version 44785 (0.0023) [2024-03-29 17:03:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 41154.4). Total num frames: 733888512. Throughput: 0: 42989.8. Samples: 616018920. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 17:03:03,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 17:03:04,117][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000044794_733904896.pth... [2024-03-29 17:03:04,465][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000044186_723943424.pth [2024-03-29 17:03:04,731][00497] Updated weights for policy 0, policy_version 44795 (0.0020) [2024-03-29 17:03:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42871.4, 300 sec: 41209.9). Total num frames: 734068736. Throughput: 0: 42663.9. Samples: 616264740. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 17:03:08,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 17:03:09,075][00497] Updated weights for policy 0, policy_version 44805 (0.0023) [2024-03-29 17:03:12,614][00497] Updated weights for policy 0, policy_version 44815 (0.0023) [2024-03-29 17:03:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 41209.9). Total num frames: 734281728. Throughput: 0: 42502.8. Samples: 616514860. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 17:03:13,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:03:16,622][00497] Updated weights for policy 0, policy_version 44825 (0.0025) [2024-03-29 17:03:18,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42871.5, 300 sec: 41376.5). Total num frames: 734527488. Throughput: 0: 42496.5. Samples: 616646620. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 17:03:18,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:03:19,956][00497] Updated weights for policy 0, policy_version 44835 (0.0028) [2024-03-29 17:03:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42871.5, 300 sec: 41265.5). Total num frames: 734707712. Throughput: 0: 42381.8. Samples: 616897660. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 17:03:23,841][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:03:24,606][00497] Updated weights for policy 0, policy_version 44845 (0.0018) [2024-03-29 17:03:28,008][00497] Updated weights for policy 0, policy_version 44855 (0.0023) [2024-03-29 17:03:28,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42598.5, 300 sec: 41432.1). Total num frames: 734920704. Throughput: 0: 42335.7. Samples: 617156340. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 17:03:28,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:03:31,857][00497] Updated weights for policy 0, policy_version 44865 (0.0028) [2024-03-29 17:03:32,557][00476] Signal inference workers to stop experience collection... (21900 times) [2024-03-29 17:03:32,593][00497] InferenceWorker_p0-w0: stopping experience collection (21900 times) [2024-03-29 17:03:32,748][00476] Signal inference workers to resume experience collection... (21900 times) [2024-03-29 17:03:32,748][00497] InferenceWorker_p0-w0: resuming experience collection (21900 times) [2024-03-29 17:03:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41376.5). Total num frames: 735150080. Throughput: 0: 42415.7. Samples: 617286940. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 17:03:33,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 17:03:35,212][00497] Updated weights for policy 0, policy_version 44875 (0.0025) [2024-03-29 17:03:38,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.5, 300 sec: 41432.1). Total num frames: 735363072. Throughput: 0: 42578.6. Samples: 617545700. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 17:03:38,840][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 17:03:39,699][00497] Updated weights for policy 0, policy_version 44885 (0.0017) [2024-03-29 17:03:43,179][00497] Updated weights for policy 0, policy_version 44895 (0.0028) [2024-03-29 17:03:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42871.6, 300 sec: 41598.7). Total num frames: 735576064. Throughput: 0: 42450.7. Samples: 617796280. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 17:03:43,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 17:03:47,261][00497] Updated weights for policy 0, policy_version 44905 (0.0018) [2024-03-29 17:03:48,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42871.6, 300 sec: 41598.7). Total num frames: 735805440. Throughput: 0: 42552.9. Samples: 617933800. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 17:03:48,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 17:03:50,775][00497] Updated weights for policy 0, policy_version 44915 (0.0018) [2024-03-29 17:03:53,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42871.4, 300 sec: 41598.7). Total num frames: 736002048. Throughput: 0: 42528.8. Samples: 618178540. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 17:03:53,842][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 17:03:55,079][00497] Updated weights for policy 0, policy_version 44925 (0.0025) [2024-03-29 17:03:58,515][00497] Updated weights for policy 0, policy_version 44935 (0.0025) [2024-03-29 17:03:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 41765.3). Total num frames: 736231424. Throughput: 0: 42849.8. Samples: 618443100. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 17:03:58,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 17:04:02,596][00497] Updated weights for policy 0, policy_version 44945 (0.0024) [2024-03-29 17:04:03,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 736428032. Throughput: 0: 42899.1. Samples: 618577080. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 17:04:03,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 17:04:05,959][00497] Updated weights for policy 0, policy_version 44955 (0.0037) [2024-03-29 17:04:07,931][00476] Signal inference workers to stop experience collection... (21950 times) [2024-03-29 17:04:08,002][00497] InferenceWorker_p0-w0: stopping experience collection (21950 times) [2024-03-29 17:04:08,007][00476] Signal inference workers to resume experience collection... (21950 times) [2024-03-29 17:04:08,028][00497] InferenceWorker_p0-w0: resuming experience collection (21950 times) [2024-03-29 17:04:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42871.5, 300 sec: 41709.8). Total num frames: 736641024. Throughput: 0: 42629.3. Samples: 618815980. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:08,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:04:10,515][00497] Updated weights for policy 0, policy_version 44965 (0.0018) [2024-03-29 17:04:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42871.4, 300 sec: 41820.8). Total num frames: 736854016. Throughput: 0: 42905.7. Samples: 619087100. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:13,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 17:04:13,943][00497] Updated weights for policy 0, policy_version 44975 (0.0021) [2024-03-29 17:04:17,905][00497] Updated weights for policy 0, policy_version 44985 (0.0017) [2024-03-29 17:04:18,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.2, 300 sec: 41820.8). Total num frames: 737067008. Throughput: 0: 42791.8. Samples: 619212580. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:18,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 17:04:21,114][00497] Updated weights for policy 0, policy_version 44995 (0.0028) [2024-03-29 17:04:23,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 737263616. Throughput: 0: 42436.9. Samples: 619455360. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:23,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:04:25,982][00497] Updated weights for policy 0, policy_version 45005 (0.0025) [2024-03-29 17:04:28,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42871.4, 300 sec: 41820.9). Total num frames: 737492992. Throughput: 0: 42865.7. Samples: 619725240. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:28,841][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:04:29,338][00497] Updated weights for policy 0, policy_version 45015 (0.0035) [2024-03-29 17:04:33,072][00497] Updated weights for policy 0, policy_version 45025 (0.0023) [2024-03-29 17:04:33,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42871.4, 300 sec: 41987.5). Total num frames: 737722368. Throughput: 0: 42743.9. Samples: 619857280. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:33,840][00126] Avg episode reward: [(0, '0.619')] [2024-03-29 17:04:36,468][00497] Updated weights for policy 0, policy_version 45035 (0.0023) [2024-03-29 17:04:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 737918976. Throughput: 0: 42724.1. Samples: 620101120. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:38,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 17:04:40,752][00476] Signal inference workers to stop experience collection... (22000 times) [2024-03-29 17:04:40,774][00497] InferenceWorker_p0-w0: stopping experience collection (22000 times) [2024-03-29 17:04:40,944][00476] Signal inference workers to resume experience collection... (22000 times) [2024-03-29 17:04:40,945][00497] InferenceWorker_p0-w0: resuming experience collection (22000 times) [2024-03-29 17:04:41,238][00497] Updated weights for policy 0, policy_version 45045 (0.0028) [2024-03-29 17:04:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42325.3, 300 sec: 41932.3). Total num frames: 738115584. Throughput: 0: 42710.2. Samples: 620365060. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:43,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 17:04:44,749][00497] Updated weights for policy 0, policy_version 45055 (0.0022) [2024-03-29 17:04:48,710][00497] Updated weights for policy 0, policy_version 45065 (0.0039) [2024-03-29 17:04:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 738344960. Throughput: 0: 42593.3. Samples: 620493780. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:48,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 17:04:52,291][00497] Updated weights for policy 0, policy_version 45075 (0.0019) [2024-03-29 17:04:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 738557952. Throughput: 0: 42652.4. Samples: 620735340. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:53,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 17:04:56,987][00497] Updated weights for policy 0, policy_version 45085 (0.0032) [2024-03-29 17:04:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 738754560. Throughput: 0: 42586.3. Samples: 621003480. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 17:04:58,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:05:00,166][00497] Updated weights for policy 0, policy_version 45095 (0.0025) [2024-03-29 17:05:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 738967552. Throughput: 0: 42516.2. Samples: 621125800. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 17:05:03,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 17:05:04,139][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000045105_739000320.pth... [2024-03-29 17:05:04,140][00497] Updated weights for policy 0, policy_version 45105 (0.0021) [2024-03-29 17:05:04,459][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000044483_728809472.pth [2024-03-29 17:05:07,776][00497] Updated weights for policy 0, policy_version 45115 (0.0019) [2024-03-29 17:05:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 739196928. Throughput: 0: 42347.1. Samples: 621360980. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 17:05:08,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 17:05:12,617][00497] Updated weights for policy 0, policy_version 45125 (0.0032) [2024-03-29 17:05:12,952][00476] Signal inference workers to stop experience collection... (22050 times) [2024-03-29 17:05:13,012][00497] InferenceWorker_p0-w0: stopping experience collection (22050 times) [2024-03-29 17:05:13,048][00476] Signal inference workers to resume experience collection... (22050 times) [2024-03-29 17:05:13,050][00497] InferenceWorker_p0-w0: resuming experience collection (22050 times) [2024-03-29 17:05:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.4, 300 sec: 42154.1). Total num frames: 739377152. Throughput: 0: 42521.9. Samples: 621638720. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 17:05:13,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 17:05:15,886][00497] Updated weights for policy 0, policy_version 45135 (0.0028) [2024-03-29 17:05:18,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 739606528. Throughput: 0: 42094.6. Samples: 621751540. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 17:05:18,840][00126] Avg episode reward: [(0, '0.489')] [2024-03-29 17:05:19,645][00497] Updated weights for policy 0, policy_version 45145 (0.0024) [2024-03-29 17:05:23,356][00497] Updated weights for policy 0, policy_version 45155 (0.0022) [2024-03-29 17:05:23,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 739835904. Throughput: 0: 42165.8. Samples: 621998580. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 17:05:23,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 17:05:27,854][00497] Updated weights for policy 0, policy_version 45165 (0.0020) [2024-03-29 17:05:28,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 740016128. Throughput: 0: 42256.9. Samples: 622266620. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 17:05:28,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 17:05:31,549][00497] Updated weights for policy 0, policy_version 45175 (0.0030) [2024-03-29 17:05:33,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.3, 300 sec: 42265.3). Total num frames: 740229120. Throughput: 0: 42110.2. Samples: 622388740. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 17:05:33,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 17:05:35,079][00497] Updated weights for policy 0, policy_version 45185 (0.0021) [2024-03-29 17:05:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 740458496. Throughput: 0: 42098.7. Samples: 622629780. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 17:05:38,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 17:05:38,944][00497] Updated weights for policy 0, policy_version 45195 (0.0019) [2024-03-29 17:05:43,724][00497] Updated weights for policy 0, policy_version 45205 (0.0028) [2024-03-29 17:05:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 740638720. Throughput: 0: 41960.4. Samples: 622891700. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 17:05:43,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 17:05:45,273][00476] Signal inference workers to stop experience collection... (22100 times) [2024-03-29 17:05:45,307][00497] InferenceWorker_p0-w0: stopping experience collection (22100 times) [2024-03-29 17:05:45,489][00476] Signal inference workers to resume experience collection... (22100 times) [2024-03-29 17:05:45,490][00497] InferenceWorker_p0-w0: resuming experience collection (22100 times) [2024-03-29 17:05:47,086][00497] Updated weights for policy 0, policy_version 45215 (0.0022) [2024-03-29 17:05:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 740851712. Throughput: 0: 42009.4. Samples: 623016220. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 17:05:48,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 17:05:50,810][00497] Updated weights for policy 0, policy_version 45225 (0.0025) [2024-03-29 17:05:53,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42325.2, 300 sec: 42431.8). Total num frames: 741097472. Throughput: 0: 42118.5. Samples: 623256320. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 17:05:53,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 17:05:54,544][00497] Updated weights for policy 0, policy_version 45235 (0.0023) [2024-03-29 17:05:58,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 741277696. Throughput: 0: 41848.3. Samples: 623521900. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 17:05:58,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:05:59,393][00497] Updated weights for policy 0, policy_version 45245 (0.0024) [2024-03-29 17:06:02,612][00497] Updated weights for policy 0, policy_version 45255 (0.0023) [2024-03-29 17:06:03,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 42376.2). Total num frames: 741490688. Throughput: 0: 42228.5. Samples: 623651820. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 17:06:03,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 17:06:06,331][00497] Updated weights for policy 0, policy_version 45265 (0.0023) [2024-03-29 17:06:08,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 741736448. Throughput: 0: 42189.8. Samples: 623897120. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 17:06:08,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:06:09,884][00497] Updated weights for policy 0, policy_version 45275 (0.0024) [2024-03-29 17:06:13,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.1, 300 sec: 42376.2). Total num frames: 741916672. Throughput: 0: 42047.4. Samples: 624158760. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 17:06:13,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:06:14,680][00497] Updated weights for policy 0, policy_version 45285 (0.0023) [2024-03-29 17:06:17,370][00476] Signal inference workers to stop experience collection... (22150 times) [2024-03-29 17:06:17,408][00497] InferenceWorker_p0-w0: stopping experience collection (22150 times) [2024-03-29 17:06:17,588][00476] Signal inference workers to resume experience collection... (22150 times) [2024-03-29 17:06:17,589][00497] InferenceWorker_p0-w0: resuming experience collection (22150 times) [2024-03-29 17:06:17,852][00497] Updated weights for policy 0, policy_version 45295 (0.0020) [2024-03-29 17:06:18,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 742129664. Throughput: 0: 42379.6. Samples: 624295820. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 17:06:18,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:06:21,559][00497] Updated weights for policy 0, policy_version 45305 (0.0024) [2024-03-29 17:06:23,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 742359040. Throughput: 0: 42313.2. Samples: 624533880. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 17:06:23,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 17:06:25,308][00497] Updated weights for policy 0, policy_version 45315 (0.0020) [2024-03-29 17:06:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 742555648. Throughput: 0: 42309.8. Samples: 624795640. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:06:28,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:06:29,960][00497] Updated weights for policy 0, policy_version 45325 (0.0023) [2024-03-29 17:06:33,517][00497] Updated weights for policy 0, policy_version 45335 (0.0030) [2024-03-29 17:06:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 42542.8). Total num frames: 742785024. Throughput: 0: 42447.8. Samples: 624926380. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:06:33,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 17:06:37,080][00497] Updated weights for policy 0, policy_version 45345 (0.0027) [2024-03-29 17:06:38,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 742998016. Throughput: 0: 42371.3. Samples: 625163020. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:06:38,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:06:41,055][00497] Updated weights for policy 0, policy_version 45355 (0.0027) [2024-03-29 17:06:43,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42598.4, 300 sec: 42376.2). Total num frames: 743194624. Throughput: 0: 42374.8. Samples: 625428760. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:06:43,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 17:06:45,687][00497] Updated weights for policy 0, policy_version 45365 (0.0023) [2024-03-29 17:06:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 743391232. Throughput: 0: 42414.3. Samples: 625560460. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:06:48,841][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 17:06:49,283][00497] Updated weights for policy 0, policy_version 45375 (0.0025) [2024-03-29 17:06:52,844][00497] Updated weights for policy 0, policy_version 45385 (0.0021) [2024-03-29 17:06:53,059][00476] Signal inference workers to stop experience collection... (22200 times) [2024-03-29 17:06:53,093][00497] InferenceWorker_p0-w0: stopping experience collection (22200 times) [2024-03-29 17:06:53,266][00476] Signal inference workers to resume experience collection... (22200 times) [2024-03-29 17:06:53,267][00497] InferenceWorker_p0-w0: resuming experience collection (22200 times) [2024-03-29 17:06:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42487.3). Total num frames: 743620608. Throughput: 0: 42386.7. Samples: 625804520. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:06:53,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:06:56,429][00497] Updated weights for policy 0, policy_version 45395 (0.0020) [2024-03-29 17:06:58,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 743833600. Throughput: 0: 42154.0. Samples: 626055680. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 17:06:58,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:07:01,205][00497] Updated weights for policy 0, policy_version 45405 (0.0029) [2024-03-29 17:07:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 42487.3). Total num frames: 744030208. Throughput: 0: 42156.9. Samples: 626192880. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 17:07:03,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 17:07:03,857][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000045412_744030208.pth... [2024-03-29 17:07:04,256][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000044794_733904896.pth [2024-03-29 17:07:05,045][00497] Updated weights for policy 0, policy_version 45415 (0.0022) [2024-03-29 17:07:08,501][00497] Updated weights for policy 0, policy_version 45425 (0.0017) [2024-03-29 17:07:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42487.3). Total num frames: 744259584. Throughput: 0: 42209.8. Samples: 626433320. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 17:07:08,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:07:12,087][00497] Updated weights for policy 0, policy_version 45435 (0.0021) [2024-03-29 17:07:13,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.6, 300 sec: 42431.8). Total num frames: 744472576. Throughput: 0: 41915.2. Samples: 626681820. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 17:07:13,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 17:07:16,932][00497] Updated weights for policy 0, policy_version 45445 (0.0017) [2024-03-29 17:07:18,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 744652800. Throughput: 0: 42026.4. Samples: 626817560. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 17:07:18,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 17:07:20,520][00497] Updated weights for policy 0, policy_version 45455 (0.0030) [2024-03-29 17:07:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 744882176. Throughput: 0: 42490.2. Samples: 627075080. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 17:07:23,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 17:07:23,843][00497] Updated weights for policy 0, policy_version 45465 (0.0019) [2024-03-29 17:07:26,298][00476] Signal inference workers to stop experience collection... (22250 times) [2024-03-29 17:07:26,318][00497] InferenceWorker_p0-w0: stopping experience collection (22250 times) [2024-03-29 17:07:26,508][00476] Signal inference workers to resume experience collection... (22250 times) [2024-03-29 17:07:26,508][00497] InferenceWorker_p0-w0: resuming experience collection (22250 times) [2024-03-29 17:07:27,536][00497] Updated weights for policy 0, policy_version 45475 (0.0023) [2024-03-29 17:07:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 745095168. Throughput: 0: 41955.1. Samples: 627316740. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:28,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 17:07:32,450][00497] Updated weights for policy 0, policy_version 45485 (0.0021) [2024-03-29 17:07:33,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 42376.2). Total num frames: 745291776. Throughput: 0: 42020.7. Samples: 627451400. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:33,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 17:07:35,984][00497] Updated weights for policy 0, policy_version 45495 (0.0021) [2024-03-29 17:07:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 745521152. Throughput: 0: 42503.7. Samples: 627717180. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:38,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 17:07:39,150][00497] Updated weights for policy 0, policy_version 45505 (0.0025) [2024-03-29 17:07:42,872][00497] Updated weights for policy 0, policy_version 45515 (0.0026) [2024-03-29 17:07:43,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 745750528. Throughput: 0: 42240.4. Samples: 627956500. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:43,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:07:47,662][00497] Updated weights for policy 0, policy_version 45525 (0.0020) [2024-03-29 17:07:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 745930752. Throughput: 0: 42184.4. Samples: 628091180. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:48,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 17:07:51,202][00497] Updated weights for policy 0, policy_version 45535 (0.0026) [2024-03-29 17:07:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 746160128. Throughput: 0: 42644.5. Samples: 628352320. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:53,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 17:07:54,527][00497] Updated weights for policy 0, policy_version 45545 (0.0024) [2024-03-29 17:07:58,316][00497] Updated weights for policy 0, policy_version 45555 (0.0018) [2024-03-29 17:07:58,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 746389504. Throughput: 0: 42559.9. Samples: 628597020. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 17:07:58,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:08:03,204][00497] Updated weights for policy 0, policy_version 45565 (0.0024) [2024-03-29 17:08:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 746569728. Throughput: 0: 42587.5. Samples: 628734000. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 17:08:03,840][00126] Avg episode reward: [(0, '0.484')] [2024-03-29 17:08:06,379][00476] Signal inference workers to stop experience collection... (22300 times) [2024-03-29 17:08:06,413][00497] InferenceWorker_p0-w0: stopping experience collection (22300 times) [2024-03-29 17:08:06,559][00476] Signal inference workers to resume experience collection... (22300 times) [2024-03-29 17:08:06,560][00497] InferenceWorker_p0-w0: resuming experience collection (22300 times) [2024-03-29 17:08:06,819][00497] Updated weights for policy 0, policy_version 45575 (0.0026) [2024-03-29 17:08:08,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 42376.2). Total num frames: 746782720. Throughput: 0: 42439.6. Samples: 628984860. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 17:08:08,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 17:08:10,144][00497] Updated weights for policy 0, policy_version 45585 (0.0023) [2024-03-29 17:08:13,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 747028480. Throughput: 0: 42288.3. Samples: 629219720. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 17:08:13,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 17:08:13,841][00497] Updated weights for policy 0, policy_version 45595 (0.0026) [2024-03-29 17:08:18,840][00126] Fps is (10 sec: 39318.8, 60 sec: 42051.8, 300 sec: 42265.1). Total num frames: 747175936. Throughput: 0: 42241.2. Samples: 629352280. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 17:08:18,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 17:08:18,897][00497] Updated weights for policy 0, policy_version 45605 (0.0023) [2024-03-29 17:08:22,582][00497] Updated weights for policy 0, policy_version 45615 (0.0024) [2024-03-29 17:08:23,839][00126] Fps is (10 sec: 37683.5, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 747405312. Throughput: 0: 42354.2. Samples: 629623120. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 17:08:23,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:08:25,783][00497] Updated weights for policy 0, policy_version 45625 (0.0025) [2024-03-29 17:08:28,839][00126] Fps is (10 sec: 47516.5, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 747651072. Throughput: 0: 42152.8. Samples: 629853380. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 17:08:28,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 17:08:29,336][00497] Updated weights for policy 0, policy_version 45635 (0.0022) [2024-03-29 17:08:33,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 747814912. Throughput: 0: 42090.1. Samples: 629985240. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 17:08:33,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 17:08:34,519][00497] Updated weights for policy 0, policy_version 45645 (0.0022) [2024-03-29 17:08:38,122][00497] Updated weights for policy 0, policy_version 45655 (0.0031) [2024-03-29 17:08:38,157][00476] Signal inference workers to stop experience collection... (22350 times) [2024-03-29 17:08:38,176][00497] InferenceWorker_p0-w0: stopping experience collection (22350 times) [2024-03-29 17:08:38,371][00476] Signal inference workers to resume experience collection... (22350 times) [2024-03-29 17:08:38,372][00497] InferenceWorker_p0-w0: resuming experience collection (22350 times) [2024-03-29 17:08:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 748044288. Throughput: 0: 42137.8. Samples: 630248520. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 17:08:38,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 17:08:41,511][00497] Updated weights for policy 0, policy_version 45665 (0.0028) [2024-03-29 17:08:43,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 748273664. Throughput: 0: 41958.8. Samples: 630485160. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 17:08:43,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:08:44,990][00497] Updated weights for policy 0, policy_version 45675 (0.0028) [2024-03-29 17:08:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 42209.7). Total num frames: 748453888. Throughput: 0: 41798.2. Samples: 630614920. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 17:08:48,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 17:08:50,303][00497] Updated weights for policy 0, policy_version 45685 (0.0017) [2024-03-29 17:08:53,711][00497] Updated weights for policy 0, policy_version 45695 (0.0028) [2024-03-29 17:08:53,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 748666880. Throughput: 0: 41941.3. Samples: 630872220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 17:08:53,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 17:08:57,389][00497] Updated weights for policy 0, policy_version 45705 (0.0027) [2024-03-29 17:08:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 748879872. Throughput: 0: 41833.9. Samples: 631102240. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 17:08:58,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 17:09:00,789][00497] Updated weights for policy 0, policy_version 45715 (0.0023) [2024-03-29 17:09:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 749076480. Throughput: 0: 41621.0. Samples: 631225200. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 17:09:03,841][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 17:09:03,862][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000045721_749092864.pth... [2024-03-29 17:09:04,181][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000045105_739000320.pth [2024-03-29 17:09:05,891][00497] Updated weights for policy 0, policy_version 45725 (0.0031) [2024-03-29 17:09:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 749289472. Throughput: 0: 41713.4. Samples: 631500220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 17:09:08,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:09:09,395][00497] Updated weights for policy 0, policy_version 45735 (0.0031) [2024-03-29 17:09:10,554][00476] Signal inference workers to stop experience collection... (22400 times) [2024-03-29 17:09:10,634][00497] InferenceWorker_p0-w0: stopping experience collection (22400 times) [2024-03-29 17:09:10,721][00476] Signal inference workers to resume experience collection... (22400 times) [2024-03-29 17:09:10,721][00497] InferenceWorker_p0-w0: resuming experience collection (22400 times) [2024-03-29 17:09:12,911][00497] Updated weights for policy 0, policy_version 45745 (0.0034) [2024-03-29 17:09:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 749518848. Throughput: 0: 41755.6. Samples: 631732380. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 17:09:13,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 17:09:16,395][00497] Updated weights for policy 0, policy_version 45755 (0.0025) [2024-03-29 17:09:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.9, 300 sec: 42265.2). Total num frames: 749731840. Throughput: 0: 41515.3. Samples: 631853420. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 17:09:18,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 17:09:21,674][00497] Updated weights for policy 0, policy_version 45765 (0.0024) [2024-03-29 17:09:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 749912064. Throughput: 0: 41708.3. Samples: 632125400. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 17:09:23,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 17:09:25,374][00497] Updated weights for policy 0, policy_version 45775 (0.0029) [2024-03-29 17:09:28,595][00497] Updated weights for policy 0, policy_version 45785 (0.0024) [2024-03-29 17:09:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 750141440. Throughput: 0: 41602.2. Samples: 632357260. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 17:09:28,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:09:32,191][00497] Updated weights for policy 0, policy_version 45795 (0.0025) [2024-03-29 17:09:33,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 750370816. Throughput: 0: 41456.4. Samples: 632480460. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 17:09:33,840][00126] Avg episode reward: [(0, '0.506')] [2024-03-29 17:09:37,196][00497] Updated weights for policy 0, policy_version 45805 (0.0032) [2024-03-29 17:09:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 750534656. Throughput: 0: 41741.0. Samples: 632750560. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 17:09:38,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:09:41,247][00497] Updated weights for policy 0, policy_version 45815 (0.0029) [2024-03-29 17:09:43,057][00476] Signal inference workers to stop experience collection... (22450 times) [2024-03-29 17:09:43,094][00497] InferenceWorker_p0-w0: stopping experience collection (22450 times) [2024-03-29 17:09:43,284][00476] Signal inference workers to resume experience collection... (22450 times) [2024-03-29 17:09:43,284][00497] InferenceWorker_p0-w0: resuming experience collection (22450 times) [2024-03-29 17:09:43,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 42043.0). Total num frames: 750747648. Throughput: 0: 41864.4. Samples: 632986140. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 17:09:43,840][00126] Avg episode reward: [(0, '0.634')] [2024-03-29 17:09:44,580][00497] Updated weights for policy 0, policy_version 45825 (0.0032) [2024-03-29 17:09:48,338][00497] Updated weights for policy 0, policy_version 45835 (0.0030) [2024-03-29 17:09:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 750977024. Throughput: 0: 41573.4. Samples: 633096000. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 17:09:48,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 17:09:53,134][00497] Updated weights for policy 0, policy_version 45845 (0.0017) [2024-03-29 17:09:53,839][00126] Fps is (10 sec: 37683.1, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 751124480. Throughput: 0: 41405.2. Samples: 633363460. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 17:09:53,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 17:09:57,087][00497] Updated weights for policy 0, policy_version 45855 (0.0020) [2024-03-29 17:09:58,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 751353856. Throughput: 0: 41617.0. Samples: 633605140. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 17:09:58,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 17:10:00,501][00497] Updated weights for policy 0, policy_version 45865 (0.0017) [2024-03-29 17:10:03,839][00126] Fps is (10 sec: 47513.9, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 751599616. Throughput: 0: 41295.1. Samples: 633711700. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 17:10:03,841][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 17:10:04,123][00497] Updated weights for policy 0, policy_version 45875 (0.0023) [2024-03-29 17:10:08,839][00126] Fps is (10 sec: 39320.9, 60 sec: 40959.9, 300 sec: 41931.9). Total num frames: 751747072. Throughput: 0: 40902.7. Samples: 633966020. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 17:10:08,840][00126] Avg episode reward: [(0, '0.635')] [2024-03-29 17:10:09,297][00497] Updated weights for policy 0, policy_version 45885 (0.0023) [2024-03-29 17:10:13,307][00497] Updated weights for policy 0, policy_version 45895 (0.0030) [2024-03-29 17:10:13,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40687.0, 300 sec: 41876.4). Total num frames: 751960064. Throughput: 0: 41484.0. Samples: 634224040. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 17:10:13,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 17:10:16,210][00476] Signal inference workers to stop experience collection... (22500 times) [2024-03-29 17:10:16,238][00497] InferenceWorker_p0-w0: stopping experience collection (22500 times) [2024-03-29 17:10:16,384][00476] Signal inference workers to resume experience collection... (22500 times) [2024-03-29 17:10:16,385][00497] InferenceWorker_p0-w0: resuming experience collection (22500 times) [2024-03-29 17:10:16,640][00497] Updated weights for policy 0, policy_version 45905 (0.0023) [2024-03-29 17:10:18,839][00126] Fps is (10 sec: 44237.4, 60 sec: 40960.0, 300 sec: 41876.4). Total num frames: 752189440. Throughput: 0: 41054.2. Samples: 634327900. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 17:10:18,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 17:10:20,334][00497] Updated weights for policy 0, policy_version 45915 (0.0027) [2024-03-29 17:10:23,839][00126] Fps is (10 sec: 39320.9, 60 sec: 40686.9, 300 sec: 41820.8). Total num frames: 752353280. Throughput: 0: 40300.3. Samples: 634564080. Policy #0 lag: (min: 0.0, avg: 22.5, max: 40.0) [2024-03-29 17:10:23,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 17:10:25,420][00497] Updated weights for policy 0, policy_version 45925 (0.0023) [2024-03-29 17:10:28,839][00126] Fps is (10 sec: 37683.1, 60 sec: 40413.8, 300 sec: 41820.9). Total num frames: 752566272. Throughput: 0: 41022.7. Samples: 634832160. Policy #0 lag: (min: 0.0, avg: 22.5, max: 40.0) [2024-03-29 17:10:28,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 17:10:29,437][00497] Updated weights for policy 0, policy_version 45935 (0.0021) [2024-03-29 17:10:32,561][00497] Updated weights for policy 0, policy_version 45945 (0.0025) [2024-03-29 17:10:33,839][00126] Fps is (10 sec: 44237.3, 60 sec: 40413.8, 300 sec: 41820.8). Total num frames: 752795648. Throughput: 0: 41141.8. Samples: 634947380. Policy #0 lag: (min: 0.0, avg: 22.5, max: 40.0) [2024-03-29 17:10:33,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 17:10:36,183][00497] Updated weights for policy 0, policy_version 45955 (0.0029) [2024-03-29 17:10:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 753008640. Throughput: 0: 40363.6. Samples: 635179820. Policy #0 lag: (min: 0.0, avg: 22.5, max: 40.0) [2024-03-29 17:10:38,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 17:10:41,262][00497] Updated weights for policy 0, policy_version 45965 (0.0018) [2024-03-29 17:10:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40686.9, 300 sec: 41820.8). Total num frames: 753188864. Throughput: 0: 40781.7. Samples: 635440320. Policy #0 lag: (min: 0.0, avg: 22.5, max: 40.0) [2024-03-29 17:10:43,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 17:10:45,404][00497] Updated weights for policy 0, policy_version 45975 (0.0019) [2024-03-29 17:10:48,360][00476] Signal inference workers to stop experience collection... (22550 times) [2024-03-29 17:10:48,436][00476] Signal inference workers to resume experience collection... (22550 times) [2024-03-29 17:10:48,438][00497] InferenceWorker_p0-w0: stopping experience collection (22550 times) [2024-03-29 17:10:48,441][00497] Updated weights for policy 0, policy_version 45985 (0.0020) [2024-03-29 17:10:48,467][00497] InferenceWorker_p0-w0: resuming experience collection (22550 times) [2024-03-29 17:10:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 40960.0, 300 sec: 41820.9). Total num frames: 753434624. Throughput: 0: 41199.0. Samples: 635565660. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 17:10:48,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 17:10:52,061][00497] Updated weights for policy 0, policy_version 45995 (0.0023) [2024-03-29 17:10:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 753631232. Throughput: 0: 40711.2. Samples: 635798020. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 17:10:53,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:10:57,028][00497] Updated weights for policy 0, policy_version 46005 (0.0022) [2024-03-29 17:10:58,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 753811456. Throughput: 0: 41191.5. Samples: 636077660. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 17:10:58,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 17:11:00,929][00497] Updated weights for policy 0, policy_version 46015 (0.0024) [2024-03-29 17:11:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 40686.9, 300 sec: 41709.8). Total num frames: 754040832. Throughput: 0: 41525.7. Samples: 636196560. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 17:11:03,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 17:11:04,256][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046025_754073600.pth... [2024-03-29 17:11:04,257][00497] Updated weights for policy 0, policy_version 46025 (0.0023) [2024-03-29 17:11:04,582][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000045412_744030208.pth [2024-03-29 17:11:07,879][00497] Updated weights for policy 0, policy_version 46035 (0.0031) [2024-03-29 17:11:08,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 754270208. Throughput: 0: 41230.3. Samples: 636419440. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 17:11:08,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 17:11:12,716][00497] Updated weights for policy 0, policy_version 46045 (0.0022) [2024-03-29 17:11:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 754434048. Throughput: 0: 41335.1. Samples: 636692240. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 17:11:13,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 17:11:16,891][00497] Updated weights for policy 0, policy_version 46055 (0.0022) [2024-03-29 17:11:18,840][00126] Fps is (10 sec: 39318.2, 60 sec: 41232.4, 300 sec: 41709.7). Total num frames: 754663424. Throughput: 0: 41514.3. Samples: 636815560. Policy #0 lag: (min: 1.0, avg: 21.0, max: 44.0) [2024-03-29 17:11:18,841][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:11:19,714][00476] Signal inference workers to stop experience collection... (22600 times) [2024-03-29 17:11:19,755][00497] InferenceWorker_p0-w0: stopping experience collection (22600 times) [2024-03-29 17:11:19,873][00476] Signal inference workers to resume experience collection... (22600 times) [2024-03-29 17:11:19,874][00497] InferenceWorker_p0-w0: resuming experience collection (22600 times) [2024-03-29 17:11:20,182][00497] Updated weights for policy 0, policy_version 46065 (0.0033) [2024-03-29 17:11:23,580][00497] Updated weights for policy 0, policy_version 46075 (0.0018) [2024-03-29 17:11:23,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.4, 300 sec: 41820.8). Total num frames: 754892800. Throughput: 0: 41547.9. Samples: 637049480. Policy #0 lag: (min: 1.0, avg: 21.0, max: 44.0) [2024-03-29 17:11:23,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:11:28,397][00497] Updated weights for policy 0, policy_version 46085 (0.0022) [2024-03-29 17:11:28,839][00126] Fps is (10 sec: 39325.4, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 755056640. Throughput: 0: 41588.5. Samples: 637311800. Policy #0 lag: (min: 1.0, avg: 21.0, max: 44.0) [2024-03-29 17:11:28,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:11:32,574][00497] Updated weights for policy 0, policy_version 46095 (0.0026) [2024-03-29 17:11:33,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 755286016. Throughput: 0: 41550.4. Samples: 637435420. Policy #0 lag: (min: 1.0, avg: 21.0, max: 44.0) [2024-03-29 17:11:33,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 17:11:35,890][00497] Updated weights for policy 0, policy_version 46105 (0.0020) [2024-03-29 17:11:38,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 755515392. Throughput: 0: 41682.7. Samples: 637673740. Policy #0 lag: (min: 1.0, avg: 21.0, max: 44.0) [2024-03-29 17:11:38,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 17:11:39,272][00497] Updated weights for policy 0, policy_version 46115 (0.0025) [2024-03-29 17:11:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 755679232. Throughput: 0: 41408.4. Samples: 637941040. Policy #0 lag: (min: 1.0, avg: 21.0, max: 44.0) [2024-03-29 17:11:43,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 17:11:44,254][00497] Updated weights for policy 0, policy_version 46125 (0.0023) [2024-03-29 17:11:48,402][00497] Updated weights for policy 0, policy_version 46135 (0.0024) [2024-03-29 17:11:48,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 755892224. Throughput: 0: 41478.3. Samples: 638063080. Policy #0 lag: (min: 1.0, avg: 19.5, max: 43.0) [2024-03-29 17:11:48,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 17:11:51,270][00476] Signal inference workers to stop experience collection... (22650 times) [2024-03-29 17:11:51,341][00497] InferenceWorker_p0-w0: stopping experience collection (22650 times) [2024-03-29 17:11:51,349][00476] Signal inference workers to resume experience collection... (22650 times) [2024-03-29 17:11:51,368][00497] InferenceWorker_p0-w0: resuming experience collection (22650 times) [2024-03-29 17:11:51,657][00497] Updated weights for policy 0, policy_version 46145 (0.0031) [2024-03-29 17:11:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 756137984. Throughput: 0: 41844.1. Samples: 638302420. Policy #0 lag: (min: 1.0, avg: 19.5, max: 43.0) [2024-03-29 17:11:53,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 17:11:54,953][00497] Updated weights for policy 0, policy_version 46155 (0.0025) [2024-03-29 17:11:58,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 756318208. Throughput: 0: 41590.1. Samples: 638563800. Policy #0 lag: (min: 1.0, avg: 19.5, max: 43.0) [2024-03-29 17:11:58,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 17:11:59,803][00497] Updated weights for policy 0, policy_version 46165 (0.0020) [2024-03-29 17:12:03,765][00497] Updated weights for policy 0, policy_version 46175 (0.0028) [2024-03-29 17:12:03,839][00126] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 41598.7). Total num frames: 756531200. Throughput: 0: 41927.8. Samples: 638702280. Policy #0 lag: (min: 1.0, avg: 19.5, max: 43.0) [2024-03-29 17:12:03,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 17:12:07,184][00497] Updated weights for policy 0, policy_version 46185 (0.0022) [2024-03-29 17:12:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 756760576. Throughput: 0: 41934.8. Samples: 638936540. Policy #0 lag: (min: 1.0, avg: 19.5, max: 43.0) [2024-03-29 17:12:08,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:12:10,489][00497] Updated weights for policy 0, policy_version 46195 (0.0019) [2024-03-29 17:12:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 756940800. Throughput: 0: 41684.0. Samples: 639187580. Policy #0 lag: (min: 1.0, avg: 19.5, max: 43.0) [2024-03-29 17:12:13,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 17:12:15,320][00497] Updated weights for policy 0, policy_version 46205 (0.0019) [2024-03-29 17:12:18,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.7, 300 sec: 41598.7). Total num frames: 757153792. Throughput: 0: 41954.5. Samples: 639323380. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 17:12:18,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 17:12:19,539][00497] Updated weights for policy 0, policy_version 46215 (0.0031) [2024-03-29 17:12:22,941][00497] Updated weights for policy 0, policy_version 46225 (0.0027) [2024-03-29 17:12:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 757383168. Throughput: 0: 41951.1. Samples: 639561540. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 17:12:23,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 17:12:25,281][00476] Signal inference workers to stop experience collection... (22700 times) [2024-03-29 17:12:25,303][00497] InferenceWorker_p0-w0: stopping experience collection (22700 times) [2024-03-29 17:12:25,462][00476] Signal inference workers to resume experience collection... (22700 times) [2024-03-29 17:12:25,463][00497] InferenceWorker_p0-w0: resuming experience collection (22700 times) [2024-03-29 17:12:26,330][00497] Updated weights for policy 0, policy_version 46235 (0.0025) [2024-03-29 17:12:28,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 757596160. Throughput: 0: 41394.2. Samples: 639803780. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 17:12:28,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:12:31,096][00497] Updated weights for policy 0, policy_version 46245 (0.0018) [2024-03-29 17:12:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 757760000. Throughput: 0: 41738.7. Samples: 639941320. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 17:12:33,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 17:12:35,372][00497] Updated weights for policy 0, policy_version 46255 (0.0021) [2024-03-29 17:12:38,564][00497] Updated weights for policy 0, policy_version 46265 (0.0019) [2024-03-29 17:12:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 758005760. Throughput: 0: 42013.8. Samples: 640193040. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 17:12:38,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:12:42,103][00497] Updated weights for policy 0, policy_version 46275 (0.0025) [2024-03-29 17:12:43,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42325.2, 300 sec: 41654.2). Total num frames: 758218752. Throughput: 0: 41245.7. Samples: 640419860. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 17:12:43,841][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:12:46,905][00497] Updated weights for policy 0, policy_version 46285 (0.0018) [2024-03-29 17:12:48,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41487.6). Total num frames: 758398976. Throughput: 0: 41417.4. Samples: 640566060. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 17:12:48,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 17:12:51,001][00497] Updated weights for policy 0, policy_version 46295 (0.0025) [2024-03-29 17:12:53,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 758628352. Throughput: 0: 41867.1. Samples: 640820560. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 17:12:53,841][00126] Avg episode reward: [(0, '0.662')] [2024-03-29 17:12:54,263][00497] Updated weights for policy 0, policy_version 46305 (0.0024) [2024-03-29 17:12:56,583][00476] Signal inference workers to stop experience collection... (22750 times) [2024-03-29 17:12:56,655][00476] Signal inference workers to resume experience collection... (22750 times) [2024-03-29 17:12:56,658][00497] InferenceWorker_p0-w0: stopping experience collection (22750 times) [2024-03-29 17:12:56,686][00497] InferenceWorker_p0-w0: resuming experience collection (22750 times) [2024-03-29 17:12:57,592][00497] Updated weights for policy 0, policy_version 46315 (0.0023) [2024-03-29 17:12:58,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 758857728. Throughput: 0: 41619.5. Samples: 641060460. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 17:12:58,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 17:13:02,240][00497] Updated weights for policy 0, policy_version 46325 (0.0022) [2024-03-29 17:13:03,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.2, 300 sec: 41543.1). Total num frames: 759037952. Throughput: 0: 41705.3. Samples: 641200120. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 17:13:03,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:13:04,095][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046329_759054336.pth... [2024-03-29 17:13:04,412][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000045721_749092864.pth [2024-03-29 17:13:06,336][00497] Updated weights for policy 0, policy_version 46335 (0.0021) [2024-03-29 17:13:08,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 759250944. Throughput: 0: 42237.8. Samples: 641462240. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 17:13:08,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 17:13:09,692][00497] Updated weights for policy 0, policy_version 46345 (0.0023) [2024-03-29 17:13:13,147][00497] Updated weights for policy 0, policy_version 46355 (0.0021) [2024-03-29 17:13:13,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42871.4, 300 sec: 41820.9). Total num frames: 759513088. Throughput: 0: 42087.0. Samples: 641697700. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 17:13:13,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 17:13:17,700][00497] Updated weights for policy 0, policy_version 46365 (0.0018) [2024-03-29 17:13:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 759676928. Throughput: 0: 41941.3. Samples: 641828680. Policy #0 lag: (min: 0.0, avg: 23.7, max: 43.0) [2024-03-29 17:13:18,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:13:21,796][00497] Updated weights for policy 0, policy_version 46375 (0.0023) [2024-03-29 17:13:23,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 759889920. Throughput: 0: 42379.1. Samples: 642100100. Policy #0 lag: (min: 0.0, avg: 23.7, max: 43.0) [2024-03-29 17:13:23,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:13:25,096][00497] Updated weights for policy 0, policy_version 46385 (0.0022) [2024-03-29 17:13:28,541][00497] Updated weights for policy 0, policy_version 46395 (0.0020) [2024-03-29 17:13:28,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 760135680. Throughput: 0: 42662.8. Samples: 642339680. Policy #0 lag: (min: 0.0, avg: 23.7, max: 43.0) [2024-03-29 17:13:28,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 17:13:30,424][00476] Signal inference workers to stop experience collection... (22800 times) [2024-03-29 17:13:30,425][00476] Signal inference workers to resume experience collection... (22800 times) [2024-03-29 17:13:30,464][00497] InferenceWorker_p0-w0: stopping experience collection (22800 times) [2024-03-29 17:13:30,464][00497] InferenceWorker_p0-w0: resuming experience collection (22800 times) [2024-03-29 17:13:32,913][00497] Updated weights for policy 0, policy_version 46405 (0.0022) [2024-03-29 17:13:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 41598.7). Total num frames: 760315904. Throughput: 0: 42380.6. Samples: 642473180. Policy #0 lag: (min: 0.0, avg: 23.7, max: 43.0) [2024-03-29 17:13:33,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 17:13:37,138][00497] Updated weights for policy 0, policy_version 46415 (0.0019) [2024-03-29 17:13:38,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42052.1, 300 sec: 41543.1). Total num frames: 760528896. Throughput: 0: 42526.0. Samples: 642734240. Policy #0 lag: (min: 0.0, avg: 23.7, max: 43.0) [2024-03-29 17:13:38,842][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 17:13:40,544][00497] Updated weights for policy 0, policy_version 46425 (0.0023) [2024-03-29 17:13:43,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 41765.3). Total num frames: 760774656. Throughput: 0: 42677.0. Samples: 642980920. Policy #0 lag: (min: 0.0, avg: 23.7, max: 43.0) [2024-03-29 17:13:43,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 17:13:43,928][00497] Updated weights for policy 0, policy_version 46435 (0.0020) [2024-03-29 17:13:48,419][00497] Updated weights for policy 0, policy_version 46445 (0.0023) [2024-03-29 17:13:48,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42598.5, 300 sec: 41654.3). Total num frames: 760954880. Throughput: 0: 42359.7. Samples: 643106300. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 17:13:48,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:13:52,590][00497] Updated weights for policy 0, policy_version 46455 (0.0020) [2024-03-29 17:13:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 761167872. Throughput: 0: 42642.6. Samples: 643381160. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 17:13:53,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 17:13:56,035][00497] Updated weights for policy 0, policy_version 46465 (0.0025) [2024-03-29 17:13:58,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 41820.9). Total num frames: 761413632. Throughput: 0: 42733.9. Samples: 643620720. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 17:13:58,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 17:13:59,261][00497] Updated weights for policy 0, policy_version 46475 (0.0024) [2024-03-29 17:14:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.5, 300 sec: 41709.8). Total num frames: 761593856. Throughput: 0: 42591.1. Samples: 643745280. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 17:14:03,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:14:03,861][00497] Updated weights for policy 0, policy_version 46485 (0.0019) [2024-03-29 17:14:06,341][00476] Signal inference workers to stop experience collection... (22850 times) [2024-03-29 17:14:06,382][00497] InferenceWorker_p0-w0: stopping experience collection (22850 times) [2024-03-29 17:14:06,564][00476] Signal inference workers to resume experience collection... (22850 times) [2024-03-29 17:14:06,565][00497] InferenceWorker_p0-w0: resuming experience collection (22850 times) [2024-03-29 17:14:08,037][00497] Updated weights for policy 0, policy_version 46495 (0.0019) [2024-03-29 17:14:08,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42598.3, 300 sec: 41654.2). Total num frames: 761806848. Throughput: 0: 42499.0. Samples: 644012560. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 17:14:08,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 17:14:11,488][00497] Updated weights for policy 0, policy_version 46505 (0.0018) [2024-03-29 17:14:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 762036224. Throughput: 0: 42524.5. Samples: 644253280. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 17:14:13,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 17:14:14,827][00497] Updated weights for policy 0, policy_version 46515 (0.0022) [2024-03-29 17:14:18,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 762232832. Throughput: 0: 42197.8. Samples: 644372080. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 17:14:18,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 17:14:19,424][00497] Updated weights for policy 0, policy_version 46525 (0.0027) [2024-03-29 17:14:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 762429440. Throughput: 0: 42555.7. Samples: 644649240. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 17:14:23,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 17:14:23,854][00497] Updated weights for policy 0, policy_version 46536 (0.0031) [2024-03-29 17:14:27,222][00497] Updated weights for policy 0, policy_version 46546 (0.0023) [2024-03-29 17:14:28,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 762658816. Throughput: 0: 42327.5. Samples: 644885660. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 17:14:28,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 17:14:30,767][00497] Updated weights for policy 0, policy_version 46556 (0.0026) [2024-03-29 17:14:33,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 41820.9). Total num frames: 762871808. Throughput: 0: 41992.4. Samples: 644995960. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 17:14:33,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 17:14:35,662][00497] Updated weights for policy 0, policy_version 46566 (0.0022) [2024-03-29 17:14:38,756][00476] Signal inference workers to stop experience collection... (22900 times) [2024-03-29 17:14:38,806][00497] InferenceWorker_p0-w0: stopping experience collection (22900 times) [2024-03-29 17:14:38,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 763052032. Throughput: 0: 42261.9. Samples: 645282940. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 17:14:38,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 17:14:38,842][00476] Signal inference workers to resume experience collection... (22900 times) [2024-03-29 17:14:38,844][00497] InferenceWorker_p0-w0: resuming experience collection (22900 times) [2024-03-29 17:14:39,355][00497] Updated weights for policy 0, policy_version 46576 (0.0030) [2024-03-29 17:14:42,846][00497] Updated weights for policy 0, policy_version 46586 (0.0025) [2024-03-29 17:14:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 763297792. Throughput: 0: 42072.4. Samples: 645513980. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 17:14:43,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 17:14:46,360][00497] Updated weights for policy 0, policy_version 46596 (0.0027) [2024-03-29 17:14:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 763494400. Throughput: 0: 42048.9. Samples: 645637480. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 17:14:48,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 17:14:50,965][00497] Updated weights for policy 0, policy_version 46606 (0.0023) [2024-03-29 17:14:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 763707392. Throughput: 0: 42103.6. Samples: 645907220. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 17:14:53,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:14:54,767][00497] Updated weights for policy 0, policy_version 46616 (0.0023) [2024-03-29 17:14:58,136][00497] Updated weights for policy 0, policy_version 46626 (0.0023) [2024-03-29 17:14:58,840][00126] Fps is (10 sec: 44231.3, 60 sec: 42051.4, 300 sec: 41820.7). Total num frames: 763936768. Throughput: 0: 42429.5. Samples: 646162660. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 17:14:58,841][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:15:01,510][00497] Updated weights for policy 0, policy_version 46636 (0.0028) [2024-03-29 17:15:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 764149760. Throughput: 0: 42627.0. Samples: 646290300. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 17:15:03,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 17:15:04,035][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046641_764166144.pth... [2024-03-29 17:15:04,347][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046025_754073600.pth [2024-03-29 17:15:05,825][00476] Signal inference workers to stop experience collection... (22950 times) [2024-03-29 17:15:05,825][00476] Signal inference workers to resume experience collection... (22950 times) [2024-03-29 17:15:05,862][00497] InferenceWorker_p0-w0: stopping experience collection (22950 times) [2024-03-29 17:15:05,863][00497] InferenceWorker_p0-w0: resuming experience collection (22950 times) [2024-03-29 17:15:06,147][00497] Updated weights for policy 0, policy_version 46646 (0.0019) [2024-03-29 17:15:08,839][00126] Fps is (10 sec: 40964.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 764346368. Throughput: 0: 42300.0. Samples: 646552740. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 17:15:08,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 17:15:09,995][00497] Updated weights for policy 0, policy_version 46656 (0.0028) [2024-03-29 17:15:13,574][00497] Updated weights for policy 0, policy_version 46666 (0.0024) [2024-03-29 17:15:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 764575744. Throughput: 0: 42397.0. Samples: 646793520. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 17:15:13,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:15:17,123][00497] Updated weights for policy 0, policy_version 46676 (0.0029) [2024-03-29 17:15:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 764788736. Throughput: 0: 42792.5. Samples: 646921620. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 17:15:18,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:15:21,815][00497] Updated weights for policy 0, policy_version 46686 (0.0020) [2024-03-29 17:15:23,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 764968960. Throughput: 0: 42226.5. Samples: 647183140. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 17:15:23,840][00126] Avg episode reward: [(0, '0.476')] [2024-03-29 17:15:25,559][00497] Updated weights for policy 0, policy_version 46696 (0.0021) [2024-03-29 17:15:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 765198336. Throughput: 0: 42608.9. Samples: 647431380. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 17:15:28,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 17:15:29,345][00497] Updated weights for policy 0, policy_version 46706 (0.0024) [2024-03-29 17:15:33,029][00497] Updated weights for policy 0, policy_version 46716 (0.0033) [2024-03-29 17:15:33,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 765427712. Throughput: 0: 42492.3. Samples: 647549640. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 17:15:33,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:15:37,826][00497] Updated weights for policy 0, policy_version 46726 (0.0023) [2024-03-29 17:15:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 765591552. Throughput: 0: 42011.2. Samples: 647797720. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 17:15:38,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 17:15:40,257][00476] Signal inference workers to stop experience collection... (23000 times) [2024-03-29 17:15:40,258][00476] Signal inference workers to resume experience collection... (23000 times) [2024-03-29 17:15:40,321][00497] InferenceWorker_p0-w0: stopping experience collection (23000 times) [2024-03-29 17:15:40,321][00497] InferenceWorker_p0-w0: resuming experience collection (23000 times) [2024-03-29 17:15:41,514][00497] Updated weights for policy 0, policy_version 46736 (0.0021) [2024-03-29 17:15:43,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 765804544. Throughput: 0: 41953.1. Samples: 648050500. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 17:15:43,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:15:45,197][00497] Updated weights for policy 0, policy_version 46746 (0.0023) [2024-03-29 17:15:48,757][00497] Updated weights for policy 0, policy_version 46756 (0.0036) [2024-03-29 17:15:48,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 766050304. Throughput: 0: 41830.2. Samples: 648172660. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 17:15:48,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 17:15:53,260][00497] Updated weights for policy 0, policy_version 46766 (0.0018) [2024-03-29 17:15:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 766230528. Throughput: 0: 41638.1. Samples: 648426460. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 17:15:53,840][00126] Avg episode reward: [(0, '0.533')] [2024-03-29 17:15:57,089][00497] Updated weights for policy 0, policy_version 46776 (0.0023) [2024-03-29 17:15:58,842][00126] Fps is (10 sec: 39310.4, 60 sec: 41778.0, 300 sec: 42042.6). Total num frames: 766443520. Throughput: 0: 41921.8. Samples: 648680120. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 17:15:58,844][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 17:16:00,932][00497] Updated weights for policy 0, policy_version 46786 (0.0023) [2024-03-29 17:16:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 766672896. Throughput: 0: 41733.3. Samples: 648799620. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 17:16:03,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:16:04,242][00497] Updated weights for policy 0, policy_version 46796 (0.0031) [2024-03-29 17:16:08,839][00126] Fps is (10 sec: 40971.8, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 766853120. Throughput: 0: 41440.2. Samples: 649047940. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 17:16:08,841][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 17:16:08,878][00497] Updated weights for policy 0, policy_version 46806 (0.0027) [2024-03-29 17:16:12,662][00497] Updated weights for policy 0, policy_version 46816 (0.0022) [2024-03-29 17:16:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 42043.1). Total num frames: 767066112. Throughput: 0: 41875.1. Samples: 649315760. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 17:16:13,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 17:16:14,796][00476] Signal inference workers to stop experience collection... (23050 times) [2024-03-29 17:16:14,829][00497] InferenceWorker_p0-w0: stopping experience collection (23050 times) [2024-03-29 17:16:15,008][00476] Signal inference workers to resume experience collection... (23050 times) [2024-03-29 17:16:15,009][00497] InferenceWorker_p0-w0: resuming experience collection (23050 times) [2024-03-29 17:16:16,363][00497] Updated weights for policy 0, policy_version 46826 (0.0029) [2024-03-29 17:16:18,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 767311872. Throughput: 0: 41837.4. Samples: 649432320. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 17:16:18,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:16:19,654][00497] Updated weights for policy 0, policy_version 46836 (0.0025) [2024-03-29 17:16:23,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 767492096. Throughput: 0: 41895.9. Samples: 649683040. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 17:16:23,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:16:24,667][00497] Updated weights for policy 0, policy_version 46846 (0.0020) [2024-03-29 17:16:28,295][00497] Updated weights for policy 0, policy_version 46856 (0.0025) [2024-03-29 17:16:28,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 767688704. Throughput: 0: 41983.1. Samples: 649939740. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 17:16:28,840][00126] Avg episode reward: [(0, '0.496')] [2024-03-29 17:16:32,179][00497] Updated weights for policy 0, policy_version 46866 (0.0023) [2024-03-29 17:16:33,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 767901696. Throughput: 0: 41898.3. Samples: 650058080. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 17:16:33,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 17:16:35,687][00497] Updated weights for policy 0, policy_version 46876 (0.0027) [2024-03-29 17:16:38,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 768114688. Throughput: 0: 41582.6. Samples: 650297680. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 17:16:38,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:16:40,620][00497] Updated weights for policy 0, policy_version 46886 (0.0024) [2024-03-29 17:16:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 768311296. Throughput: 0: 41705.7. Samples: 650556760. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:16:43,841][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 17:16:44,396][00497] Updated weights for policy 0, policy_version 46896 (0.0018) [2024-03-29 17:16:44,406][00476] Signal inference workers to stop experience collection... (23100 times) [2024-03-29 17:16:44,406][00476] Signal inference workers to resume experience collection... (23100 times) [2024-03-29 17:16:44,446][00497] InferenceWorker_p0-w0: stopping experience collection (23100 times) [2024-03-29 17:16:44,446][00497] InferenceWorker_p0-w0: resuming experience collection (23100 times) [2024-03-29 17:16:47,874][00497] Updated weights for policy 0, policy_version 46906 (0.0026) [2024-03-29 17:16:48,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.0, 300 sec: 42043.0). Total num frames: 768540672. Throughput: 0: 41880.7. Samples: 650684260. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:16:48,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:16:51,338][00497] Updated weights for policy 0, policy_version 46916 (0.0023) [2024-03-29 17:16:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 768753664. Throughput: 0: 41491.9. Samples: 650915080. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:16:53,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 17:16:55,927][00497] Updated weights for policy 0, policy_version 46926 (0.0028) [2024-03-29 17:16:58,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41781.2, 300 sec: 42098.6). Total num frames: 768950272. Throughput: 0: 42004.4. Samples: 651205960. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:16:58,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 17:16:59,593][00497] Updated weights for policy 0, policy_version 46936 (0.0028) [2024-03-29 17:17:03,528][00497] Updated weights for policy 0, policy_version 46946 (0.0028) [2024-03-29 17:17:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 769179648. Throughput: 0: 41897.7. Samples: 651317720. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:17:03,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 17:17:04,123][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046948_769196032.pth... [2024-03-29 17:17:04,459][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046329_759054336.pth [2024-03-29 17:17:06,987][00497] Updated weights for policy 0, policy_version 46956 (0.0028) [2024-03-29 17:17:08,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 42209.6). Total num frames: 769392640. Throughput: 0: 41603.5. Samples: 651555200. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:17:08,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:17:11,755][00497] Updated weights for policy 0, policy_version 46966 (0.0023) [2024-03-29 17:17:13,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 769572864. Throughput: 0: 41849.8. Samples: 651822980. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 17:17:13,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 17:17:15,344][00497] Updated weights for policy 0, policy_version 46976 (0.0025) [2024-03-29 17:17:17,673][00476] Signal inference workers to stop experience collection... (23150 times) [2024-03-29 17:17:17,728][00497] InferenceWorker_p0-w0: stopping experience collection (23150 times) [2024-03-29 17:17:17,841][00476] Signal inference workers to resume experience collection... (23150 times) [2024-03-29 17:17:17,841][00497] InferenceWorker_p0-w0: resuming experience collection (23150 times) [2024-03-29 17:17:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 769802240. Throughput: 0: 41916.4. Samples: 651944320. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 17:17:18,841][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:17:19,390][00497] Updated weights for policy 0, policy_version 46986 (0.0021) [2024-03-29 17:17:22,619][00497] Updated weights for policy 0, policy_version 46996 (0.0023) [2024-03-29 17:17:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 770015232. Throughput: 0: 41925.9. Samples: 652184340. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 17:17:23,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 17:17:27,294][00497] Updated weights for policy 0, policy_version 47006 (0.0018) [2024-03-29 17:17:28,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 770195456. Throughput: 0: 42129.3. Samples: 652452580. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 17:17:28,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 17:17:31,038][00497] Updated weights for policy 0, policy_version 47016 (0.0022) [2024-03-29 17:17:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 770424832. Throughput: 0: 41936.1. Samples: 652571380. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 17:17:33,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 17:17:35,026][00497] Updated weights for policy 0, policy_version 47026 (0.0022) [2024-03-29 17:17:38,366][00497] Updated weights for policy 0, policy_version 47036 (0.0022) [2024-03-29 17:17:38,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.5, 300 sec: 42154.1). Total num frames: 770654208. Throughput: 0: 42230.3. Samples: 652815440. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 17:17:38,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:17:42,979][00497] Updated weights for policy 0, policy_version 47046 (0.0026) [2024-03-29 17:17:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 770834432. Throughput: 0: 41628.4. Samples: 653079240. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:17:43,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 17:17:46,579][00497] Updated weights for policy 0, policy_version 47056 (0.0023) [2024-03-29 17:17:48,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 771047424. Throughput: 0: 41922.7. Samples: 653204240. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:17:48,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 17:17:50,652][00497] Updated weights for policy 0, policy_version 47066 (0.0026) [2024-03-29 17:17:52,137][00476] Signal inference workers to stop experience collection... (23200 times) [2024-03-29 17:17:52,215][00497] InferenceWorker_p0-w0: stopping experience collection (23200 times) [2024-03-29 17:17:52,222][00476] Signal inference workers to resume experience collection... (23200 times) [2024-03-29 17:17:52,241][00497] InferenceWorker_p0-w0: resuming experience collection (23200 times) [2024-03-29 17:17:53,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 771276800. Throughput: 0: 42166.2. Samples: 653452680. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:17:53,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:17:54,056][00497] Updated weights for policy 0, policy_version 47076 (0.0021) [2024-03-29 17:17:58,732][00497] Updated weights for policy 0, policy_version 47086 (0.0018) [2024-03-29 17:17:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 771457024. Throughput: 0: 41991.1. Samples: 653712580. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:17:58,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 17:18:02,081][00497] Updated weights for policy 0, policy_version 47096 (0.0026) [2024-03-29 17:18:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 771686400. Throughput: 0: 42014.9. Samples: 653835000. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:18:03,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 17:18:06,161][00497] Updated weights for policy 0, policy_version 47106 (0.0023) [2024-03-29 17:18:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 771899392. Throughput: 0: 42173.4. Samples: 654082140. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:18:08,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:18:09,497][00497] Updated weights for policy 0, policy_version 47116 (0.0023) [2024-03-29 17:18:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 772096000. Throughput: 0: 41912.8. Samples: 654338660. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 17:18:13,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 17:18:14,137][00497] Updated weights for policy 0, policy_version 47126 (0.0026) [2024-03-29 17:18:17,630][00497] Updated weights for policy 0, policy_version 47136 (0.0018) [2024-03-29 17:18:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 772325376. Throughput: 0: 42172.1. Samples: 654469120. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 17:18:18,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 17:18:21,747][00497] Updated weights for policy 0, policy_version 47146 (0.0024) [2024-03-29 17:18:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 772521984. Throughput: 0: 42169.3. Samples: 654713060. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 17:18:23,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:18:24,425][00476] Signal inference workers to stop experience collection... (23250 times) [2024-03-29 17:18:24,506][00476] Signal inference workers to resume experience collection... (23250 times) [2024-03-29 17:18:24,507][00497] InferenceWorker_p0-w0: stopping experience collection (23250 times) [2024-03-29 17:18:24,533][00497] InferenceWorker_p0-w0: resuming experience collection (23250 times) [2024-03-29 17:18:25,100][00497] Updated weights for policy 0, policy_version 47156 (0.0027) [2024-03-29 17:18:28,839][00126] Fps is (10 sec: 39320.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 772718592. Throughput: 0: 41811.0. Samples: 654960740. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 17:18:28,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 17:18:29,869][00497] Updated weights for policy 0, policy_version 47166 (0.0018) [2024-03-29 17:18:33,427][00497] Updated weights for policy 0, policy_version 47176 (0.0019) [2024-03-29 17:18:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 772947968. Throughput: 0: 42154.3. Samples: 655101180. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 17:18:33,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 17:18:37,383][00497] Updated weights for policy 0, policy_version 47186 (0.0028) [2024-03-29 17:18:38,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 773144576. Throughput: 0: 41902.9. Samples: 655338300. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 17:18:38,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:18:40,986][00497] Updated weights for policy 0, policy_version 47196 (0.0028) [2024-03-29 17:18:43,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 773357568. Throughput: 0: 41411.5. Samples: 655576100. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 17:18:43,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:18:45,627][00497] Updated weights for policy 0, policy_version 47206 (0.0025) [2024-03-29 17:18:48,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 773570560. Throughput: 0: 41890.8. Samples: 655720080. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 17:18:48,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 17:18:49,027][00497] Updated weights for policy 0, policy_version 47216 (0.0026) [2024-03-29 17:18:52,979][00497] Updated weights for policy 0, policy_version 47226 (0.0024) [2024-03-29 17:18:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 773783552. Throughput: 0: 41937.3. Samples: 655969320. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 17:18:53,840][00126] Avg episode reward: [(0, '0.653')] [2024-03-29 17:18:56,315][00497] Updated weights for policy 0, policy_version 47236 (0.0021) [2024-03-29 17:18:58,674][00476] Signal inference workers to stop experience collection... (23300 times) [2024-03-29 17:18:58,717][00497] InferenceWorker_p0-w0: stopping experience collection (23300 times) [2024-03-29 17:18:58,835][00476] Signal inference workers to resume experience collection... (23300 times) [2024-03-29 17:18:58,835][00497] InferenceWorker_p0-w0: resuming experience collection (23300 times) [2024-03-29 17:18:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 773996544. Throughput: 0: 41729.5. Samples: 656216480. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 17:18:58,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:19:01,004][00497] Updated weights for policy 0, policy_version 47246 (0.0029) [2024-03-29 17:19:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 774193152. Throughput: 0: 41832.4. Samples: 656351580. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 17:19:03,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 17:19:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000047253_774193152.pth... [2024-03-29 17:19:04,170][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046641_764166144.pth [2024-03-29 17:19:04,950][00497] Updated weights for policy 0, policy_version 47256 (0.0019) [2024-03-29 17:19:08,606][00497] Updated weights for policy 0, policy_version 47266 (0.0018) [2024-03-29 17:19:08,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 774406144. Throughput: 0: 42005.7. Samples: 656603320. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 17:19:08,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 17:19:12,141][00497] Updated weights for policy 0, policy_version 47276 (0.0028) [2024-03-29 17:19:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 774635520. Throughput: 0: 41645.9. Samples: 656834800. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:19:13,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:19:16,775][00497] Updated weights for policy 0, policy_version 47286 (0.0020) [2024-03-29 17:19:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 774815744. Throughput: 0: 41767.1. Samples: 656980700. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:19:18,840][00126] Avg episode reward: [(0, '0.476')] [2024-03-29 17:19:20,481][00497] Updated weights for policy 0, policy_version 47296 (0.0032) [2024-03-29 17:19:23,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 775028736. Throughput: 0: 41906.9. Samples: 657224120. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:19:23,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 17:19:24,284][00497] Updated weights for policy 0, policy_version 47306 (0.0029) [2024-03-29 17:19:27,751][00497] Updated weights for policy 0, policy_version 47316 (0.0033) [2024-03-29 17:19:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 775258112. Throughput: 0: 41970.3. Samples: 657464760. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:19:28,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 17:19:32,367][00497] Updated weights for policy 0, policy_version 47326 (0.0019) [2024-03-29 17:19:33,211][00476] Signal inference workers to stop experience collection... (23350 times) [2024-03-29 17:19:33,246][00497] InferenceWorker_p0-w0: stopping experience collection (23350 times) [2024-03-29 17:19:33,392][00476] Signal inference workers to resume experience collection... (23350 times) [2024-03-29 17:19:33,393][00497] InferenceWorker_p0-w0: resuming experience collection (23350 times) [2024-03-29 17:19:33,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41987.4). Total num frames: 775438336. Throughput: 0: 41844.7. Samples: 657603100. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:19:33,840][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 17:19:36,069][00497] Updated weights for policy 0, policy_version 47336 (0.0023) [2024-03-29 17:19:38,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 775651328. Throughput: 0: 41799.4. Samples: 657850300. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:19:38,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:19:40,197][00497] Updated weights for policy 0, policy_version 47346 (0.0023) [2024-03-29 17:19:43,550][00497] Updated weights for policy 0, policy_version 47356 (0.0022) [2024-03-29 17:19:43,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 775897088. Throughput: 0: 41602.0. Samples: 658088580. Policy #0 lag: (min: 0.0, avg: 21.3, max: 44.0) [2024-03-29 17:19:43,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:19:48,146][00497] Updated weights for policy 0, policy_version 47366 (0.0034) [2024-03-29 17:19:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 776060928. Throughput: 0: 41665.3. Samples: 658226520. Policy #0 lag: (min: 0.0, avg: 21.3, max: 44.0) [2024-03-29 17:19:48,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 17:19:51,936][00497] Updated weights for policy 0, policy_version 47376 (0.0025) [2024-03-29 17:19:53,839][00126] Fps is (10 sec: 37683.9, 60 sec: 41506.1, 300 sec: 41821.0). Total num frames: 776273920. Throughput: 0: 41682.3. Samples: 658479020. Policy #0 lag: (min: 0.0, avg: 21.3, max: 44.0) [2024-03-29 17:19:53,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 17:19:55,708][00497] Updated weights for policy 0, policy_version 47386 (0.0022) [2024-03-29 17:19:58,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 776519680. Throughput: 0: 42211.1. Samples: 658734300. Policy #0 lag: (min: 0.0, avg: 21.3, max: 44.0) [2024-03-29 17:19:58,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:19:58,965][00497] Updated weights for policy 0, policy_version 47396 (0.0023) [2024-03-29 17:20:03,631][00497] Updated weights for policy 0, policy_version 47406 (0.0023) [2024-03-29 17:20:03,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 776699904. Throughput: 0: 41793.6. Samples: 658861420. Policy #0 lag: (min: 0.0, avg: 21.3, max: 44.0) [2024-03-29 17:20:03,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:20:04,623][00476] Signal inference workers to stop experience collection... (23400 times) [2024-03-29 17:20:04,625][00476] Signal inference workers to resume experience collection... (23400 times) [2024-03-29 17:20:04,667][00497] InferenceWorker_p0-w0: stopping experience collection (23400 times) [2024-03-29 17:20:04,667][00497] InferenceWorker_p0-w0: resuming experience collection (23400 times) [2024-03-29 17:20:07,540][00497] Updated weights for policy 0, policy_version 47416 (0.0022) [2024-03-29 17:20:08,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 776896512. Throughput: 0: 42125.0. Samples: 659119740. Policy #0 lag: (min: 0.0, avg: 21.3, max: 44.0) [2024-03-29 17:20:08,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 17:20:11,445][00497] Updated weights for policy 0, policy_version 47426 (0.0024) [2024-03-29 17:20:13,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 777125888. Throughput: 0: 41972.0. Samples: 659353500. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 17:20:13,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 17:20:14,781][00497] Updated weights for policy 0, policy_version 47436 (0.0027) [2024-03-29 17:20:18,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 777306112. Throughput: 0: 41797.8. Samples: 659484000. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 17:20:18,842][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:20:19,725][00497] Updated weights for policy 0, policy_version 47446 (0.0024) [2024-03-29 17:20:23,496][00497] Updated weights for policy 0, policy_version 47456 (0.0022) [2024-03-29 17:20:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 777535488. Throughput: 0: 41737.0. Samples: 659728460. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 17:20:23,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 17:20:27,325][00497] Updated weights for policy 0, policy_version 47466 (0.0024) [2024-03-29 17:20:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 777748480. Throughput: 0: 41753.8. Samples: 659967500. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 17:20:28,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 17:20:30,673][00497] Updated weights for policy 0, policy_version 47476 (0.0024) [2024-03-29 17:20:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 777961472. Throughput: 0: 41404.4. Samples: 660089720. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 17:20:33,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:20:35,038][00476] Signal inference workers to stop experience collection... (23450 times) [2024-03-29 17:20:35,076][00497] InferenceWorker_p0-w0: stopping experience collection (23450 times) [2024-03-29 17:20:35,252][00476] Signal inference workers to resume experience collection... (23450 times) [2024-03-29 17:20:35,253][00497] InferenceWorker_p0-w0: resuming experience collection (23450 times) [2024-03-29 17:20:35,507][00497] Updated weights for policy 0, policy_version 47486 (0.0025) [2024-03-29 17:20:38,839][00126] Fps is (10 sec: 39322.4, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 778141696. Throughput: 0: 41644.0. Samples: 660353000. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 17:20:38,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:20:39,467][00497] Updated weights for policy 0, policy_version 47496 (0.0026) [2024-03-29 17:20:43,290][00497] Updated weights for policy 0, policy_version 47506 (0.0025) [2024-03-29 17:20:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 778371072. Throughput: 0: 41252.8. Samples: 660590680. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:20:43,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:20:46,707][00497] Updated weights for policy 0, policy_version 47516 (0.0020) [2024-03-29 17:20:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 778567680. Throughput: 0: 40929.4. Samples: 660703240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:20:48,842][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 17:20:51,559][00497] Updated weights for policy 0, policy_version 47526 (0.0032) [2024-03-29 17:20:53,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41765.7). Total num frames: 778764288. Throughput: 0: 41144.8. Samples: 660971260. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:20:53,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 17:20:55,337][00497] Updated weights for policy 0, policy_version 47536 (0.0028) [2024-03-29 17:20:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40959.9, 300 sec: 41709.8). Total num frames: 778977280. Throughput: 0: 41209.7. Samples: 661207940. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:20:58,840][00126] Avg episode reward: [(0, '0.667')] [2024-03-29 17:20:59,039][00497] Updated weights for policy 0, policy_version 47546 (0.0020) [2024-03-29 17:21:02,421][00497] Updated weights for policy 0, policy_version 47556 (0.0022) [2024-03-29 17:21:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 779190272. Throughput: 0: 41008.4. Samples: 661329380. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:21:03,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 17:21:03,864][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000047558_779190272.pth... [2024-03-29 17:21:04,216][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000046948_769196032.pth [2024-03-29 17:21:07,038][00476] Signal inference workers to stop experience collection... (23500 times) [2024-03-29 17:21:07,114][00476] Signal inference workers to resume experience collection... (23500 times) [2024-03-29 17:21:07,117][00497] InferenceWorker_p0-w0: stopping experience collection (23500 times) [2024-03-29 17:21:07,144][00497] InferenceWorker_p0-w0: resuming experience collection (23500 times) [2024-03-29 17:21:07,375][00497] Updated weights for policy 0, policy_version 47566 (0.0034) [2024-03-29 17:21:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 779386880. Throughput: 0: 41680.8. Samples: 661604100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:21:08,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 17:21:11,280][00497] Updated weights for policy 0, policy_version 47576 (0.0021) [2024-03-29 17:21:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 779616256. Throughput: 0: 41450.4. Samples: 661832760. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 17:21:13,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 17:21:14,797][00497] Updated weights for policy 0, policy_version 47586 (0.0025) [2024-03-29 17:21:18,310][00497] Updated weights for policy 0, policy_version 47596 (0.0024) [2024-03-29 17:21:18,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 779829248. Throughput: 0: 41304.9. Samples: 661948440. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 17:21:18,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 17:21:23,219][00497] Updated weights for policy 0, policy_version 47606 (0.0021) [2024-03-29 17:21:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 780009472. Throughput: 0: 41470.1. Samples: 662219160. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 17:21:23,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 17:21:27,041][00497] Updated weights for policy 0, policy_version 47616 (0.0027) [2024-03-29 17:21:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 780222464. Throughput: 0: 41484.4. Samples: 662457480. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 17:21:28,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:21:30,683][00497] Updated weights for policy 0, policy_version 47626 (0.0023) [2024-03-29 17:21:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 780451840. Throughput: 0: 41660.0. Samples: 662577940. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 17:21:33,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 17:21:33,957][00497] Updated weights for policy 0, policy_version 47636 (0.0023) [2024-03-29 17:21:38,242][00476] Signal inference workers to stop experience collection... (23550 times) [2024-03-29 17:21:38,310][00497] InferenceWorker_p0-w0: stopping experience collection (23550 times) [2024-03-29 17:21:38,313][00476] Signal inference workers to resume experience collection... (23550 times) [2024-03-29 17:21:38,335][00497] InferenceWorker_p0-w0: resuming experience collection (23550 times) [2024-03-29 17:21:38,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 780615680. Throughput: 0: 41461.4. Samples: 662837020. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 17:21:38,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:21:38,926][00497] Updated weights for policy 0, policy_version 47646 (0.0018) [2024-03-29 17:21:42,431][00497] Updated weights for policy 0, policy_version 47656 (0.0023) [2024-03-29 17:21:43,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.3, 300 sec: 41765.3). Total num frames: 780861440. Throughput: 0: 41996.6. Samples: 663097780. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 17:21:43,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:21:45,950][00497] Updated weights for policy 0, policy_version 47666 (0.0025) [2024-03-29 17:21:48,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 781090816. Throughput: 0: 41991.2. Samples: 663218980. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 17:21:48,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 17:21:49,506][00497] Updated weights for policy 0, policy_version 47676 (0.0019) [2024-03-29 17:21:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 781254656. Throughput: 0: 41523.2. Samples: 663472640. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 17:21:53,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 17:21:54,339][00497] Updated weights for policy 0, policy_version 47686 (0.0026) [2024-03-29 17:21:58,238][00497] Updated weights for policy 0, policy_version 47696 (0.0022) [2024-03-29 17:21:58,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 781484032. Throughput: 0: 42119.1. Samples: 663728120. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 17:21:58,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 17:22:01,426][00497] Updated weights for policy 0, policy_version 47706 (0.0025) [2024-03-29 17:22:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 781713408. Throughput: 0: 42151.6. Samples: 663845260. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 17:22:03,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 17:22:04,990][00497] Updated weights for policy 0, policy_version 47716 (0.0024) [2024-03-29 17:22:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 781893632. Throughput: 0: 41809.8. Samples: 664100600. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 17:22:08,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:22:09,992][00497] Updated weights for policy 0, policy_version 47726 (0.0028) [2024-03-29 17:22:10,987][00476] Signal inference workers to stop experience collection... (23600 times) [2024-03-29 17:22:11,010][00497] InferenceWorker_p0-w0: stopping experience collection (23600 times) [2024-03-29 17:22:11,203][00476] Signal inference workers to resume experience collection... (23600 times) [2024-03-29 17:22:11,204][00497] InferenceWorker_p0-w0: resuming experience collection (23600 times) [2024-03-29 17:22:13,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 782090240. Throughput: 0: 42097.8. Samples: 664351880. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:22:13,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:22:14,050][00497] Updated weights for policy 0, policy_version 47736 (0.0024) [2024-03-29 17:22:17,407][00497] Updated weights for policy 0, policy_version 47746 (0.0021) [2024-03-29 17:22:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 782336000. Throughput: 0: 41824.4. Samples: 664460040. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:22:18,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 17:22:20,927][00497] Updated weights for policy 0, policy_version 47756 (0.0020) [2024-03-29 17:22:23,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 782532608. Throughput: 0: 41686.6. Samples: 664712920. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:22:23,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 17:22:26,064][00497] Updated weights for policy 0, policy_version 47766 (0.0028) [2024-03-29 17:22:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 782712832. Throughput: 0: 41586.6. Samples: 664969180. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:22:28,841][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:22:29,794][00497] Updated weights for policy 0, policy_version 47776 (0.0034) [2024-03-29 17:22:33,077][00497] Updated weights for policy 0, policy_version 47786 (0.0019) [2024-03-29 17:22:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 782942208. Throughput: 0: 41599.6. Samples: 665090960. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:22:33,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:22:36,606][00497] Updated weights for policy 0, policy_version 47796 (0.0029) [2024-03-29 17:22:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 783155200. Throughput: 0: 41271.9. Samples: 665329880. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:22:38,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 17:22:42,005][00497] Updated weights for policy 0, policy_version 47806 (0.0020) [2024-03-29 17:22:43,222][00476] Signal inference workers to stop experience collection... (23650 times) [2024-03-29 17:22:43,251][00497] InferenceWorker_p0-w0: stopping experience collection (23650 times) [2024-03-29 17:22:43,414][00476] Signal inference workers to resume experience collection... (23650 times) [2024-03-29 17:22:43,415][00497] InferenceWorker_p0-w0: resuming experience collection (23650 times) [2024-03-29 17:22:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 783351808. Throughput: 0: 41444.5. Samples: 665593120. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 17:22:43,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 17:22:45,593][00497] Updated weights for policy 0, policy_version 47816 (0.0022) [2024-03-29 17:22:48,665][00497] Updated weights for policy 0, policy_version 47826 (0.0022) [2024-03-29 17:22:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 783581184. Throughput: 0: 41636.8. Samples: 665718920. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 17:22:48,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 17:22:52,501][00497] Updated weights for policy 0, policy_version 47836 (0.0018) [2024-03-29 17:22:53,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 783777792. Throughput: 0: 41117.0. Samples: 665950860. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 17:22:53,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 17:22:57,575][00497] Updated weights for policy 0, policy_version 47846 (0.0018) [2024-03-29 17:22:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 783974400. Throughput: 0: 41775.7. Samples: 666231780. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 17:22:58,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 17:23:01,209][00497] Updated weights for policy 0, policy_version 47856 (0.0023) [2024-03-29 17:23:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 784203776. Throughput: 0: 42132.1. Samples: 666355980. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 17:23:03,841][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:23:04,016][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000047865_784220160.pth... [2024-03-29 17:23:04,388][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000047253_774193152.pth [2024-03-29 17:23:04,729][00497] Updated weights for policy 0, policy_version 47866 (0.0026) [2024-03-29 17:23:08,117][00497] Updated weights for policy 0, policy_version 47876 (0.0025) [2024-03-29 17:23:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 784416768. Throughput: 0: 41422.2. Samples: 666576920. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 17:23:08,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:23:13,301][00497] Updated weights for policy 0, policy_version 47886 (0.0019) [2024-03-29 17:23:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.2, 300 sec: 41543.1). Total num frames: 784580608. Throughput: 0: 41927.1. Samples: 666855900. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:23:13,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 17:23:14,528][00476] Signal inference workers to stop experience collection... (23700 times) [2024-03-29 17:23:14,600][00497] InferenceWorker_p0-w0: stopping experience collection (23700 times) [2024-03-29 17:23:14,694][00476] Signal inference workers to resume experience collection... (23700 times) [2024-03-29 17:23:14,695][00497] InferenceWorker_p0-w0: resuming experience collection (23700 times) [2024-03-29 17:23:17,137][00497] Updated weights for policy 0, policy_version 47896 (0.0022) [2024-03-29 17:23:18,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 784809984. Throughput: 0: 41845.7. Samples: 666974020. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:23:18,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:23:20,154][00497] Updated weights for policy 0, policy_version 47906 (0.0024) [2024-03-29 17:23:23,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 785039360. Throughput: 0: 41495.2. Samples: 667197160. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:23:23,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 17:23:24,007][00497] Updated weights for policy 0, policy_version 47916 (0.0033) [2024-03-29 17:23:28,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.0, 300 sec: 41543.1). Total num frames: 785203200. Throughput: 0: 41821.2. Samples: 667475080. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:23:28,841][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 17:23:29,093][00497] Updated weights for policy 0, policy_version 47926 (0.0018) [2024-03-29 17:23:33,079][00497] Updated weights for policy 0, policy_version 47936 (0.0019) [2024-03-29 17:23:33,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 785416192. Throughput: 0: 41446.6. Samples: 667584020. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:23:33,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 17:23:36,073][00497] Updated weights for policy 0, policy_version 47946 (0.0035) [2024-03-29 17:23:38,839][00126] Fps is (10 sec: 45876.0, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 785661952. Throughput: 0: 41732.1. Samples: 667828800. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:23:38,841][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:23:39,804][00497] Updated weights for policy 0, policy_version 47956 (0.0019) [2024-03-29 17:23:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41543.1). Total num frames: 785825792. Throughput: 0: 41108.3. Samples: 668081660. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:23:43,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 17:23:44,833][00497] Updated weights for policy 0, policy_version 47966 (0.0027) [2024-03-29 17:23:46,309][00476] Signal inference workers to stop experience collection... (23750 times) [2024-03-29 17:23:46,340][00497] InferenceWorker_p0-w0: stopping experience collection (23750 times) [2024-03-29 17:23:46,509][00476] Signal inference workers to resume experience collection... (23750 times) [2024-03-29 17:23:46,509][00497] InferenceWorker_p0-w0: resuming experience collection (23750 times) [2024-03-29 17:23:48,708][00497] Updated weights for policy 0, policy_version 47976 (0.0030) [2024-03-29 17:23:48,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 786038784. Throughput: 0: 41285.3. Samples: 668213820. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:23:48,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 17:23:51,965][00497] Updated weights for policy 0, policy_version 47986 (0.0031) [2024-03-29 17:23:53,839][00126] Fps is (10 sec: 47514.2, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 786300928. Throughput: 0: 41618.7. Samples: 668449760. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:23:53,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 17:23:55,491][00497] Updated weights for policy 0, policy_version 47996 (0.0033) [2024-03-29 17:23:58,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 786481152. Throughput: 0: 41160.9. Samples: 668708140. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:23:58,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 17:24:00,269][00497] Updated weights for policy 0, policy_version 48006 (0.0026) [2024-03-29 17:24:03,839][00126] Fps is (10 sec: 34405.8, 60 sec: 40686.8, 300 sec: 41487.6). Total num frames: 786644992. Throughput: 0: 41583.9. Samples: 668845300. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:24:03,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 17:24:04,513][00497] Updated weights for policy 0, policy_version 48016 (0.0028) [2024-03-29 17:24:07,699][00497] Updated weights for policy 0, policy_version 48026 (0.0027) [2024-03-29 17:24:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 786907136. Throughput: 0: 41776.4. Samples: 669077100. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:24:08,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 17:24:11,551][00497] Updated weights for policy 0, policy_version 48036 (0.0019) [2024-03-29 17:24:13,839][00126] Fps is (10 sec: 44237.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 787087360. Throughput: 0: 40806.8. Samples: 669311380. Policy #0 lag: (min: 1.0, avg: 22.8, max: 40.0) [2024-03-29 17:24:13,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:24:16,246][00497] Updated weights for policy 0, policy_version 48046 (0.0030) [2024-03-29 17:24:18,299][00476] Signal inference workers to stop experience collection... (23800 times) [2024-03-29 17:24:18,362][00497] InferenceWorker_p0-w0: stopping experience collection (23800 times) [2024-03-29 17:24:18,464][00476] Signal inference workers to resume experience collection... (23800 times) [2024-03-29 17:24:18,464][00497] InferenceWorker_p0-w0: resuming experience collection (23800 times) [2024-03-29 17:24:18,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 787300352. Throughput: 0: 41565.0. Samples: 669454440. Policy #0 lag: (min: 1.0, avg: 22.8, max: 40.0) [2024-03-29 17:24:18,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 17:24:20,530][00497] Updated weights for policy 0, policy_version 48056 (0.0023) [2024-03-29 17:24:23,431][00497] Updated weights for policy 0, policy_version 48066 (0.0020) [2024-03-29 17:24:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41506.0, 300 sec: 41598.7). Total num frames: 787529728. Throughput: 0: 41619.4. Samples: 669701680. Policy #0 lag: (min: 1.0, avg: 22.8, max: 40.0) [2024-03-29 17:24:23,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:24:27,175][00497] Updated weights for policy 0, policy_version 48076 (0.0030) [2024-03-29 17:24:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.4, 300 sec: 41654.3). Total num frames: 787726336. Throughput: 0: 41214.3. Samples: 669936300. Policy #0 lag: (min: 1.0, avg: 22.8, max: 40.0) [2024-03-29 17:24:28,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 17:24:31,780][00497] Updated weights for policy 0, policy_version 48086 (0.0021) [2024-03-29 17:24:33,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 787906560. Throughput: 0: 41643.6. Samples: 670087780. Policy #0 lag: (min: 1.0, avg: 22.8, max: 40.0) [2024-03-29 17:24:33,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 17:24:36,259][00497] Updated weights for policy 0, policy_version 48096 (0.0023) [2024-03-29 17:24:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 788135936. Throughput: 0: 41960.4. Samples: 670337980. Policy #0 lag: (min: 1.0, avg: 22.8, max: 40.0) [2024-03-29 17:24:38,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 17:24:39,338][00497] Updated weights for policy 0, policy_version 48106 (0.0024) [2024-03-29 17:24:43,173][00497] Updated weights for policy 0, policy_version 48116 (0.0020) [2024-03-29 17:24:43,839][00126] Fps is (10 sec: 45874.1, 60 sec: 42325.3, 300 sec: 41709.7). Total num frames: 788365312. Throughput: 0: 41092.7. Samples: 670557320. Policy #0 lag: (min: 0.0, avg: 22.2, max: 40.0) [2024-03-29 17:24:43,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 17:24:47,691][00497] Updated weights for policy 0, policy_version 48126 (0.0024) [2024-03-29 17:24:48,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 788529152. Throughput: 0: 41082.2. Samples: 670694000. Policy #0 lag: (min: 0.0, avg: 22.2, max: 40.0) [2024-03-29 17:24:48,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 17:24:51,122][00476] Signal inference workers to stop experience collection... (23850 times) [2024-03-29 17:24:51,153][00497] InferenceWorker_p0-w0: stopping experience collection (23850 times) [2024-03-29 17:24:51,309][00476] Signal inference workers to resume experience collection... (23850 times) [2024-03-29 17:24:51,310][00497] InferenceWorker_p0-w0: resuming experience collection (23850 times) [2024-03-29 17:24:52,095][00497] Updated weights for policy 0, policy_version 48136 (0.0027) [2024-03-29 17:24:53,839][00126] Fps is (10 sec: 37684.0, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 788742144. Throughput: 0: 41912.0. Samples: 670963140. Policy #0 lag: (min: 0.0, avg: 22.2, max: 40.0) [2024-03-29 17:24:53,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:24:55,218][00497] Updated weights for policy 0, policy_version 48146 (0.0028) [2024-03-29 17:24:58,787][00497] Updated weights for policy 0, policy_version 48156 (0.0022) [2024-03-29 17:24:58,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 788987904. Throughput: 0: 41706.1. Samples: 671188160. Policy #0 lag: (min: 0.0, avg: 22.2, max: 40.0) [2024-03-29 17:24:58,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 17:25:03,287][00497] Updated weights for policy 0, policy_version 48166 (0.0026) [2024-03-29 17:25:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 789168128. Throughput: 0: 41264.4. Samples: 671311340. Policy #0 lag: (min: 0.0, avg: 22.2, max: 40.0) [2024-03-29 17:25:03,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 17:25:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000048167_789168128.pth... [2024-03-29 17:25:04,190][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000047558_779190272.pth [2024-03-29 17:25:07,876][00497] Updated weights for policy 0, policy_version 48176 (0.0026) [2024-03-29 17:25:08,839][00126] Fps is (10 sec: 37683.7, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 789364736. Throughput: 0: 41847.7. Samples: 671584820. Policy #0 lag: (min: 0.0, avg: 22.2, max: 40.0) [2024-03-29 17:25:08,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 17:25:10,922][00497] Updated weights for policy 0, policy_version 48186 (0.0028) [2024-03-29 17:25:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 789594112. Throughput: 0: 41502.2. Samples: 671803900. Policy #0 lag: (min: 1.0, avg: 23.4, max: 45.0) [2024-03-29 17:25:13,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 17:25:14,570][00497] Updated weights for policy 0, policy_version 48196 (0.0024) [2024-03-29 17:25:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 789790720. Throughput: 0: 41071.1. Samples: 671935980. Policy #0 lag: (min: 1.0, avg: 23.4, max: 45.0) [2024-03-29 17:25:18,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 17:25:19,055][00497] Updated weights for policy 0, policy_version 48206 (0.0029) [2024-03-29 17:25:23,657][00497] Updated weights for policy 0, policy_version 48216 (0.0021) [2024-03-29 17:25:23,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41432.1). Total num frames: 789970944. Throughput: 0: 41264.0. Samples: 672194860. Policy #0 lag: (min: 1.0, avg: 23.4, max: 45.0) [2024-03-29 17:25:23,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 17:25:24,709][00476] Signal inference workers to stop experience collection... (23900 times) [2024-03-29 17:25:24,732][00497] InferenceWorker_p0-w0: stopping experience collection (23900 times) [2024-03-29 17:25:24,923][00476] Signal inference workers to resume experience collection... (23900 times) [2024-03-29 17:25:24,924][00497] InferenceWorker_p0-w0: resuming experience collection (23900 times) [2024-03-29 17:25:26,570][00497] Updated weights for policy 0, policy_version 48226 (0.0025) [2024-03-29 17:25:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 790233088. Throughput: 0: 41488.7. Samples: 672424300. Policy #0 lag: (min: 1.0, avg: 23.4, max: 45.0) [2024-03-29 17:25:28,840][00126] Avg episode reward: [(0, '0.632')] [2024-03-29 17:25:30,308][00497] Updated weights for policy 0, policy_version 48236 (0.0024) [2024-03-29 17:25:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 790413312. Throughput: 0: 41357.4. Samples: 672555080. Policy #0 lag: (min: 1.0, avg: 23.4, max: 45.0) [2024-03-29 17:25:33,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 17:25:34,739][00497] Updated weights for policy 0, policy_version 48246 (0.0021) [2024-03-29 17:25:38,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 790609920. Throughput: 0: 41316.0. Samples: 672822360. Policy #0 lag: (min: 1.0, avg: 23.4, max: 45.0) [2024-03-29 17:25:38,841][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 17:25:39,388][00497] Updated weights for policy 0, policy_version 48256 (0.0023) [2024-03-29 17:25:42,401][00497] Updated weights for policy 0, policy_version 48266 (0.0020) [2024-03-29 17:25:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 790855680. Throughput: 0: 41540.4. Samples: 673057480. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 17:25:43,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 17:25:45,879][00497] Updated weights for policy 0, policy_version 48276 (0.0018) [2024-03-29 17:25:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 791052288. Throughput: 0: 41491.5. Samples: 673178460. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 17:25:48,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 17:25:50,636][00497] Updated weights for policy 0, policy_version 48286 (0.0019) [2024-03-29 17:25:53,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 791232512. Throughput: 0: 41628.0. Samples: 673458080. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 17:25:53,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 17:25:55,115][00497] Updated weights for policy 0, policy_version 48296 (0.0025) [2024-03-29 17:25:55,545][00476] Signal inference workers to stop experience collection... (23950 times) [2024-03-29 17:25:55,589][00497] InferenceWorker_p0-w0: stopping experience collection (23950 times) [2024-03-29 17:25:55,743][00476] Signal inference workers to resume experience collection... (23950 times) [2024-03-29 17:25:55,744][00497] InferenceWorker_p0-w0: resuming experience collection (23950 times) [2024-03-29 17:25:58,045][00497] Updated weights for policy 0, policy_version 48306 (0.0021) [2024-03-29 17:25:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 791478272. Throughput: 0: 41939.9. Samples: 673691200. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 17:25:58,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:26:01,587][00497] Updated weights for policy 0, policy_version 48316 (0.0020) [2024-03-29 17:26:03,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 791691264. Throughput: 0: 41715.9. Samples: 673813200. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 17:26:03,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:26:06,297][00497] Updated weights for policy 0, policy_version 48326 (0.0018) [2024-03-29 17:26:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 791887872. Throughput: 0: 41810.7. Samples: 674076340. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 17:26:08,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:26:10,812][00497] Updated weights for policy 0, policy_version 48336 (0.0026) [2024-03-29 17:26:13,788][00497] Updated weights for policy 0, policy_version 48346 (0.0032) [2024-03-29 17:26:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 792100864. Throughput: 0: 41942.9. Samples: 674311740. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 17:26:13,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 17:26:17,561][00497] Updated weights for policy 0, policy_version 48356 (0.0026) [2024-03-29 17:26:18,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 792297472. Throughput: 0: 41408.0. Samples: 674418440. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 17:26:18,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 17:26:22,227][00497] Updated weights for policy 0, policy_version 48366 (0.0028) [2024-03-29 17:26:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.1, 300 sec: 41598.7). Total num frames: 792494080. Throughput: 0: 41474.4. Samples: 674688720. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 17:26:23,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 17:26:27,004][00497] Updated weights for policy 0, policy_version 48376 (0.0020) [2024-03-29 17:26:27,370][00476] Signal inference workers to stop experience collection... (24000 times) [2024-03-29 17:26:27,407][00497] InferenceWorker_p0-w0: stopping experience collection (24000 times) [2024-03-29 17:26:27,595][00476] Signal inference workers to resume experience collection... (24000 times) [2024-03-29 17:26:27,596][00497] InferenceWorker_p0-w0: resuming experience collection (24000 times) [2024-03-29 17:26:28,839][00126] Fps is (10 sec: 39322.2, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 792690688. Throughput: 0: 41826.8. Samples: 674939680. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 17:26:28,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:26:29,830][00497] Updated weights for policy 0, policy_version 48386 (0.0026) [2024-03-29 17:26:33,519][00497] Updated weights for policy 0, policy_version 48396 (0.0024) [2024-03-29 17:26:33,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 792936448. Throughput: 0: 41492.8. Samples: 675045640. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 17:26:33,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 17:26:38,192][00497] Updated weights for policy 0, policy_version 48406 (0.0026) [2024-03-29 17:26:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 793100288. Throughput: 0: 41072.5. Samples: 675306340. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 17:26:38,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 17:26:42,839][00497] Updated weights for policy 0, policy_version 48416 (0.0028) [2024-03-29 17:26:43,839][00126] Fps is (10 sec: 36045.4, 60 sec: 40687.0, 300 sec: 41376.5). Total num frames: 793296896. Throughput: 0: 41861.4. Samples: 675574960. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:26:43,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 17:26:45,705][00497] Updated weights for policy 0, policy_version 48426 (0.0028) [2024-03-29 17:26:48,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 793542656. Throughput: 0: 41194.3. Samples: 675666940. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:26:48,842][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 17:26:49,265][00497] Updated weights for policy 0, policy_version 48436 (0.0026) [2024-03-29 17:26:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 793722880. Throughput: 0: 40989.3. Samples: 675920860. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:26:53,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:26:54,029][00497] Updated weights for policy 0, policy_version 48446 (0.0020) [2024-03-29 17:26:57,442][00476] Signal inference workers to stop experience collection... (24050 times) [2024-03-29 17:26:57,495][00497] InferenceWorker_p0-w0: stopping experience collection (24050 times) [2024-03-29 17:26:57,531][00476] Signal inference workers to resume experience collection... (24050 times) [2024-03-29 17:26:57,533][00497] InferenceWorker_p0-w0: resuming experience collection (24050 times) [2024-03-29 17:26:58,477][00497] Updated weights for policy 0, policy_version 48456 (0.0030) [2024-03-29 17:26:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 40686.9, 300 sec: 41376.5). Total num frames: 793919488. Throughput: 0: 41933.9. Samples: 676198760. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:26:58,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 17:27:01,608][00497] Updated weights for policy 0, policy_version 48466 (0.0029) [2024-03-29 17:27:03,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 794181632. Throughput: 0: 41652.1. Samples: 676292780. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:27:03,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:27:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000048473_794181632.pth... [2024-03-29 17:27:04,195][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000047865_784220160.pth [2024-03-29 17:27:05,120][00497] Updated weights for policy 0, policy_version 48476 (0.0025) [2024-03-29 17:27:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 794345472. Throughput: 0: 41252.2. Samples: 676545060. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:27:08,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:27:09,572][00497] Updated weights for policy 0, policy_version 48486 (0.0022) [2024-03-29 17:27:13,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40687.1, 300 sec: 41376.6). Total num frames: 794542080. Throughput: 0: 41803.5. Samples: 676820840. Policy #0 lag: (min: 2.0, avg: 18.2, max: 42.0) [2024-03-29 17:27:13,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:27:14,156][00497] Updated weights for policy 0, policy_version 48496 (0.0022) [2024-03-29 17:27:17,284][00497] Updated weights for policy 0, policy_version 48506 (0.0028) [2024-03-29 17:27:18,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 794804224. Throughput: 0: 41745.9. Samples: 676924200. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 17:27:18,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 17:27:20,897][00497] Updated weights for policy 0, policy_version 48516 (0.0025) [2024-03-29 17:27:23,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 794984448. Throughput: 0: 41339.8. Samples: 677166640. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 17:27:23,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 17:27:25,483][00497] Updated weights for policy 0, policy_version 48526 (0.0022) [2024-03-29 17:27:25,709][00476] Signal inference workers to stop experience collection... (24100 times) [2024-03-29 17:27:25,786][00476] Signal inference workers to resume experience collection... (24100 times) [2024-03-29 17:27:25,787][00497] InferenceWorker_p0-w0: stopping experience collection (24100 times) [2024-03-29 17:27:25,814][00497] InferenceWorker_p0-w0: resuming experience collection (24100 times) [2024-03-29 17:27:28,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41432.1). Total num frames: 795164672. Throughput: 0: 41205.7. Samples: 677429220. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 17:27:28,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 17:27:29,957][00497] Updated weights for policy 0, policy_version 48536 (0.0024) [2024-03-29 17:27:32,992][00497] Updated weights for policy 0, policy_version 48546 (0.0034) [2024-03-29 17:27:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 795410432. Throughput: 0: 42126.1. Samples: 677562620. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 17:27:33,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 17:27:36,523][00497] Updated weights for policy 0, policy_version 48556 (0.0023) [2024-03-29 17:27:38,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 795623424. Throughput: 0: 41505.3. Samples: 677788600. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 17:27:38,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:27:41,243][00497] Updated weights for policy 0, policy_version 48566 (0.0023) [2024-03-29 17:27:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41487.6). Total num frames: 795820032. Throughput: 0: 41102.2. Samples: 678048360. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 17:27:43,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 17:27:45,818][00497] Updated weights for policy 0, policy_version 48576 (0.0024) [2024-03-29 17:27:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 796016640. Throughput: 0: 42185.4. Samples: 678191120. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 17:27:48,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 17:27:48,897][00497] Updated weights for policy 0, policy_version 48586 (0.0033) [2024-03-29 17:27:52,449][00497] Updated weights for policy 0, policy_version 48596 (0.0026) [2024-03-29 17:27:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 41654.2). Total num frames: 796262400. Throughput: 0: 41597.6. Samples: 678416960. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 17:27:53,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:27:57,029][00497] Updated weights for policy 0, policy_version 48606 (0.0024) [2024-03-29 17:27:57,062][00476] Signal inference workers to stop experience collection... (24150 times) [2024-03-29 17:27:57,100][00497] InferenceWorker_p0-w0: stopping experience collection (24150 times) [2024-03-29 17:27:57,254][00476] Signal inference workers to resume experience collection... (24150 times) [2024-03-29 17:27:57,254][00497] InferenceWorker_p0-w0: resuming experience collection (24150 times) [2024-03-29 17:27:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41543.2). Total num frames: 796459008. Throughput: 0: 41267.6. Samples: 678677880. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 17:27:58,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 17:28:01,210][00497] Updated weights for policy 0, policy_version 48616 (0.0018) [2024-03-29 17:28:03,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 796655616. Throughput: 0: 42070.2. Samples: 678817360. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 17:28:03,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 17:28:04,669][00497] Updated weights for policy 0, policy_version 48626 (0.0034) [2024-03-29 17:28:08,395][00497] Updated weights for policy 0, policy_version 48636 (0.0024) [2024-03-29 17:28:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 796868608. Throughput: 0: 41255.6. Samples: 679023140. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 17:28:08,840][00126] Avg episode reward: [(0, '0.644')] [2024-03-29 17:28:12,943][00497] Updated weights for policy 0, policy_version 48646 (0.0025) [2024-03-29 17:28:13,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41487.6). Total num frames: 797048832. Throughput: 0: 41409.7. Samples: 679292660. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 17:28:13,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 17:28:17,236][00497] Updated weights for policy 0, policy_version 48656 (0.0025) [2024-03-29 17:28:18,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40687.0, 300 sec: 41376.5). Total num frames: 797245440. Throughput: 0: 41345.5. Samples: 679423160. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 17:28:18,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 17:28:20,647][00497] Updated weights for policy 0, policy_version 48666 (0.0022) [2024-03-29 17:28:23,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 797474816. Throughput: 0: 41267.5. Samples: 679645640. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 17:28:23,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 17:28:24,380][00497] Updated weights for policy 0, policy_version 48676 (0.0031) [2024-03-29 17:28:25,340][00476] Signal inference workers to stop experience collection... (24200 times) [2024-03-29 17:28:25,365][00497] InferenceWorker_p0-w0: stopping experience collection (24200 times) [2024-03-29 17:28:25,526][00476] Signal inference workers to resume experience collection... (24200 times) [2024-03-29 17:28:25,527][00497] InferenceWorker_p0-w0: resuming experience collection (24200 times) [2024-03-29 17:28:28,727][00497] Updated weights for policy 0, policy_version 48686 (0.0017) [2024-03-29 17:28:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 797671424. Throughput: 0: 41529.0. Samples: 679917160. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 17:28:28,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 17:28:33,056][00497] Updated weights for policy 0, policy_version 48696 (0.0024) [2024-03-29 17:28:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41321.0). Total num frames: 797851648. Throughput: 0: 40959.5. Samples: 680034300. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 17:28:33,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 17:28:36,452][00497] Updated weights for policy 0, policy_version 48706 (0.0024) [2024-03-29 17:28:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41654.3). Total num frames: 798113792. Throughput: 0: 40983.7. Samples: 680261220. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 17:28:38,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:28:40,185][00497] Updated weights for policy 0, policy_version 48716 (0.0026) [2024-03-29 17:28:43,839][00126] Fps is (10 sec: 42597.7, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 798277632. Throughput: 0: 40945.2. Samples: 680520420. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 17:28:43,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 17:28:44,695][00497] Updated weights for policy 0, policy_version 48726 (0.0028) [2024-03-29 17:28:48,839][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 41265.5). Total num frames: 798474240. Throughput: 0: 40732.9. Samples: 680650340. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:28:48,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 17:28:48,932][00497] Updated weights for policy 0, policy_version 48736 (0.0018) [2024-03-29 17:28:52,222][00497] Updated weights for policy 0, policy_version 48746 (0.0021) [2024-03-29 17:28:53,839][00126] Fps is (10 sec: 44237.6, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 798720000. Throughput: 0: 41715.6. Samples: 680900340. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:28:53,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 17:28:56,088][00497] Updated weights for policy 0, policy_version 48756 (0.0026) [2024-03-29 17:28:56,930][00476] Signal inference workers to stop experience collection... (24250 times) [2024-03-29 17:28:56,934][00476] Signal inference workers to resume experience collection... (24250 times) [2024-03-29 17:28:56,982][00497] InferenceWorker_p0-w0: stopping experience collection (24250 times) [2024-03-29 17:28:56,982][00497] InferenceWorker_p0-w0: resuming experience collection (24250 times) [2024-03-29 17:28:58,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 798932992. Throughput: 0: 41125.0. Samples: 681143280. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:28:58,841][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 17:29:00,376][00497] Updated weights for policy 0, policy_version 48766 (0.0019) [2024-03-29 17:29:03,839][00126] Fps is (10 sec: 37682.4, 60 sec: 40686.8, 300 sec: 41321.0). Total num frames: 799096832. Throughput: 0: 41291.8. Samples: 681281300. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:29:03,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 17:29:03,873][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000048774_799113216.pth... [2024-03-29 17:29:04,184][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000048167_789168128.pth [2024-03-29 17:29:04,720][00497] Updated weights for policy 0, policy_version 48776 (0.0031) [2024-03-29 17:29:08,063][00497] Updated weights for policy 0, policy_version 48786 (0.0026) [2024-03-29 17:29:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 799342592. Throughput: 0: 41612.5. Samples: 681518200. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:29:08,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 17:29:12,012][00497] Updated weights for policy 0, policy_version 48796 (0.0018) [2024-03-29 17:29:13,839][00126] Fps is (10 sec: 44237.7, 60 sec: 41506.2, 300 sec: 41487.6). Total num frames: 799539200. Throughput: 0: 41002.7. Samples: 681762280. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:29:13,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:29:16,127][00497] Updated weights for policy 0, policy_version 48806 (0.0032) [2024-03-29 17:29:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41376.6). Total num frames: 799735808. Throughput: 0: 41544.9. Samples: 681903820. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 17:29:18,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:29:20,411][00497] Updated weights for policy 0, policy_version 48816 (0.0023) [2024-03-29 17:29:23,553][00497] Updated weights for policy 0, policy_version 48826 (0.0030) [2024-03-29 17:29:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 799965184. Throughput: 0: 42152.4. Samples: 682158080. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 17:29:23,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:29:27,536][00497] Updated weights for policy 0, policy_version 48836 (0.0020) [2024-03-29 17:29:28,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 800194560. Throughput: 0: 41801.4. Samples: 682401480. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 17:29:28,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 17:29:31,528][00497] Updated weights for policy 0, policy_version 48846 (0.0019) [2024-03-29 17:29:32,342][00476] Signal inference workers to stop experience collection... (24300 times) [2024-03-29 17:29:32,342][00476] Signal inference workers to resume experience collection... (24300 times) [2024-03-29 17:29:32,391][00497] InferenceWorker_p0-w0: stopping experience collection (24300 times) [2024-03-29 17:29:32,391][00497] InferenceWorker_p0-w0: resuming experience collection (24300 times) [2024-03-29 17:29:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 41543.2). Total num frames: 800391168. Throughput: 0: 41884.0. Samples: 682535120. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 17:29:33,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 17:29:35,934][00497] Updated weights for policy 0, policy_version 48856 (0.0023) [2024-03-29 17:29:38,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.0, 300 sec: 41487.6). Total num frames: 800604160. Throughput: 0: 42046.9. Samples: 682792460. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 17:29:38,841][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:29:39,054][00497] Updated weights for policy 0, policy_version 48866 (0.0022) [2024-03-29 17:29:43,112][00497] Updated weights for policy 0, policy_version 48876 (0.0023) [2024-03-29 17:29:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.4, 300 sec: 41598.7). Total num frames: 800800768. Throughput: 0: 41883.1. Samples: 683028020. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 17:29:43,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:29:47,310][00497] Updated weights for policy 0, policy_version 48886 (0.0030) [2024-03-29 17:29:48,839][00126] Fps is (10 sec: 40960.9, 60 sec: 42325.3, 300 sec: 41598.7). Total num frames: 801013760. Throughput: 0: 41637.1. Samples: 683154960. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:29:48,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 17:29:51,675][00497] Updated weights for policy 0, policy_version 48896 (0.0020) [2024-03-29 17:29:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 801210368. Throughput: 0: 42122.6. Samples: 683413720. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:29:53,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 17:29:54,941][00497] Updated weights for policy 0, policy_version 48906 (0.0019) [2024-03-29 17:29:58,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 801423360. Throughput: 0: 41961.3. Samples: 683650540. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:29:58,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 17:29:58,910][00497] Updated weights for policy 0, policy_version 48916 (0.0020) [2024-03-29 17:30:03,041][00497] Updated weights for policy 0, policy_version 48926 (0.0032) [2024-03-29 17:30:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.4, 300 sec: 41543.2). Total num frames: 801619968. Throughput: 0: 41531.5. Samples: 683772740. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:30:03,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:30:04,412][00476] Signal inference workers to stop experience collection... (24350 times) [2024-03-29 17:30:04,454][00497] InferenceWorker_p0-w0: stopping experience collection (24350 times) [2024-03-29 17:30:04,580][00476] Signal inference workers to resume experience collection... (24350 times) [2024-03-29 17:30:04,580][00497] InferenceWorker_p0-w0: resuming experience collection (24350 times) [2024-03-29 17:30:07,430][00497] Updated weights for policy 0, policy_version 48936 (0.0033) [2024-03-29 17:30:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 801832960. Throughput: 0: 41896.0. Samples: 684043400. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:30:08,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:30:10,617][00497] Updated weights for policy 0, policy_version 48946 (0.0026) [2024-03-29 17:30:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41543.1). Total num frames: 802045952. Throughput: 0: 41450.7. Samples: 684266760. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:30:13,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 17:30:14,756][00497] Updated weights for policy 0, policy_version 48956 (0.0032) [2024-03-29 17:30:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 802242560. Throughput: 0: 41371.5. Samples: 684396840. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 17:30:18,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 17:30:18,881][00497] Updated weights for policy 0, policy_version 48966 (0.0023) [2024-03-29 17:30:23,136][00497] Updated weights for policy 0, policy_version 48976 (0.0019) [2024-03-29 17:30:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41376.5). Total num frames: 802439168. Throughput: 0: 41561.9. Samples: 684662740. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:30:23,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 17:30:26,574][00497] Updated weights for policy 0, policy_version 48986 (0.0037) [2024-03-29 17:30:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 802684928. Throughput: 0: 41433.7. Samples: 684892540. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:30:28,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 17:30:30,709][00497] Updated weights for policy 0, policy_version 48996 (0.0022) [2024-03-29 17:30:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41233.0, 300 sec: 41543.1). Total num frames: 802865152. Throughput: 0: 41361.1. Samples: 685016220. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:30:33,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 17:30:34,549][00476] Signal inference workers to stop experience collection... (24400 times) [2024-03-29 17:30:34,588][00497] InferenceWorker_p0-w0: stopping experience collection (24400 times) [2024-03-29 17:30:34,777][00476] Signal inference workers to resume experience collection... (24400 times) [2024-03-29 17:30:34,777][00497] InferenceWorker_p0-w0: resuming experience collection (24400 times) [2024-03-29 17:30:34,780][00497] Updated weights for policy 0, policy_version 49006 (0.0029) [2024-03-29 17:30:38,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.1, 300 sec: 41376.6). Total num frames: 803061760. Throughput: 0: 41121.3. Samples: 685264180. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:30:38,841][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 17:30:38,998][00497] Updated weights for policy 0, policy_version 49016 (0.0028) [2024-03-29 17:30:42,498][00497] Updated weights for policy 0, policy_version 49026 (0.0027) [2024-03-29 17:30:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.0, 300 sec: 41487.6). Total num frames: 803291136. Throughput: 0: 41205.7. Samples: 685504800. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:30:43,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:30:46,430][00497] Updated weights for policy 0, policy_version 49036 (0.0019) [2024-03-29 17:30:48,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.0, 300 sec: 41598.7). Total num frames: 803504128. Throughput: 0: 41528.3. Samples: 685641520. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 17:30:48,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 17:30:50,366][00497] Updated weights for policy 0, policy_version 49046 (0.0018) [2024-03-29 17:30:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.0, 300 sec: 41376.5). Total num frames: 803684352. Throughput: 0: 41252.7. Samples: 685899780. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 17:30:53,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 17:30:54,552][00497] Updated weights for policy 0, policy_version 49056 (0.0019) [2024-03-29 17:30:58,326][00497] Updated weights for policy 0, policy_version 49066 (0.0031) [2024-03-29 17:30:58,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 803913728. Throughput: 0: 41578.3. Samples: 686137780. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 17:30:58,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 17:31:02,272][00497] Updated weights for policy 0, policy_version 49076 (0.0032) [2024-03-29 17:31:03,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 804110336. Throughput: 0: 41328.5. Samples: 686256620. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 17:31:03,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:31:04,244][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000049081_804143104.pth... [2024-03-29 17:31:04,575][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000048473_794181632.pth [2024-03-29 17:31:05,722][00476] Signal inference workers to stop experience collection... (24450 times) [2024-03-29 17:31:05,822][00497] InferenceWorker_p0-w0: stopping experience collection (24450 times) [2024-03-29 17:31:05,960][00476] Signal inference workers to resume experience collection... (24450 times) [2024-03-29 17:31:05,960][00497] InferenceWorker_p0-w0: resuming experience collection (24450 times) [2024-03-29 17:31:06,265][00497] Updated weights for policy 0, policy_version 49086 (0.0019) [2024-03-29 17:31:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 804323328. Throughput: 0: 41121.4. Samples: 686513200. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 17:31:08,841][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:31:10,444][00497] Updated weights for policy 0, policy_version 49096 (0.0033) [2024-03-29 17:31:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 804519936. Throughput: 0: 41375.6. Samples: 686754440. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 17:31:13,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 17:31:14,187][00497] Updated weights for policy 0, policy_version 49106 (0.0023) [2024-03-29 17:31:18,343][00497] Updated weights for policy 0, policy_version 49116 (0.0028) [2024-03-29 17:31:18,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 804732928. Throughput: 0: 41274.2. Samples: 686873560. Policy #0 lag: (min: 1.0, avg: 19.8, max: 41.0) [2024-03-29 17:31:18,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 17:31:22,202][00497] Updated weights for policy 0, policy_version 49126 (0.0021) [2024-03-29 17:31:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 804929536. Throughput: 0: 41428.0. Samples: 687128440. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 17:31:23,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 17:31:26,319][00497] Updated weights for policy 0, policy_version 49136 (0.0020) [2024-03-29 17:31:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41376.5). Total num frames: 805142528. Throughput: 0: 41803.5. Samples: 687385960. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 17:31:28,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 17:31:29,776][00497] Updated weights for policy 0, policy_version 49146 (0.0026) [2024-03-29 17:31:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 805355520. Throughput: 0: 41413.5. Samples: 687505120. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 17:31:33,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 17:31:33,957][00497] Updated weights for policy 0, policy_version 49156 (0.0020) [2024-03-29 17:31:37,783][00497] Updated weights for policy 0, policy_version 49166 (0.0021) [2024-03-29 17:31:38,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 805568512. Throughput: 0: 41355.3. Samples: 687760760. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 17:31:38,841][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:31:41,740][00476] Signal inference workers to stop experience collection... (24500 times) [2024-03-29 17:31:41,742][00476] Signal inference workers to resume experience collection... (24500 times) [2024-03-29 17:31:41,767][00497] InferenceWorker_p0-w0: stopping experience collection (24500 times) [2024-03-29 17:31:41,767][00497] InferenceWorker_p0-w0: resuming experience collection (24500 times) [2024-03-29 17:31:41,999][00497] Updated weights for policy 0, policy_version 49176 (0.0023) [2024-03-29 17:31:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 805765120. Throughput: 0: 41802.6. Samples: 688018900. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 17:31:43,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:31:45,731][00497] Updated weights for policy 0, policy_version 49186 (0.0034) [2024-03-29 17:31:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 805994496. Throughput: 0: 41394.2. Samples: 688119360. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 17:31:48,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:31:49,823][00497] Updated weights for policy 0, policy_version 49196 (0.0025) [2024-03-29 17:31:53,581][00497] Updated weights for policy 0, policy_version 49206 (0.0019) [2024-03-29 17:31:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 806191104. Throughput: 0: 41507.1. Samples: 688381020. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 17:31:53,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 17:31:57,496][00497] Updated weights for policy 0, policy_version 49216 (0.0028) [2024-03-29 17:31:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 806404096. Throughput: 0: 42103.9. Samples: 688649120. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 17:31:58,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 17:32:01,269][00497] Updated weights for policy 0, policy_version 49226 (0.0030) [2024-03-29 17:32:03,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 806633472. Throughput: 0: 42008.5. Samples: 688763940. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 17:32:03,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 17:32:05,293][00497] Updated weights for policy 0, policy_version 49236 (0.0026) [2024-03-29 17:32:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 806813696. Throughput: 0: 41847.6. Samples: 689011580. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 17:32:08,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 17:32:09,177][00497] Updated weights for policy 0, policy_version 49246 (0.0022) [2024-03-29 17:32:13,398][00497] Updated weights for policy 0, policy_version 49256 (0.0023) [2024-03-29 17:32:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 807026688. Throughput: 0: 41907.2. Samples: 689271780. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 17:32:13,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 17:32:15,096][00476] Signal inference workers to stop experience collection... (24550 times) [2024-03-29 17:32:15,137][00497] InferenceWorker_p0-w0: stopping experience collection (24550 times) [2024-03-29 17:32:15,256][00476] Signal inference workers to resume experience collection... (24550 times) [2024-03-29 17:32:15,256][00497] InferenceWorker_p0-w0: resuming experience collection (24550 times) [2024-03-29 17:32:16,914][00497] Updated weights for policy 0, policy_version 49266 (0.0026) [2024-03-29 17:32:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 807239680. Throughput: 0: 42023.6. Samples: 689396180. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 17:32:18,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 17:32:21,016][00497] Updated weights for policy 0, policy_version 49276 (0.0020) [2024-03-29 17:32:23,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 807452672. Throughput: 0: 41871.0. Samples: 689644960. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:32:23,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:32:24,672][00497] Updated weights for policy 0, policy_version 49286 (0.0026) [2024-03-29 17:32:28,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 807649280. Throughput: 0: 41830.6. Samples: 689901280. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:32:28,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 17:32:28,939][00497] Updated weights for policy 0, policy_version 49296 (0.0024) [2024-03-29 17:32:32,386][00497] Updated weights for policy 0, policy_version 49306 (0.0026) [2024-03-29 17:32:33,840][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.1, 300 sec: 41543.1). Total num frames: 807878656. Throughput: 0: 42423.4. Samples: 690028420. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:32:33,841][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 17:32:36,669][00497] Updated weights for policy 0, policy_version 49316 (0.0029) [2024-03-29 17:32:38,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 808075264. Throughput: 0: 41964.0. Samples: 690269400. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:32:38,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:32:40,579][00497] Updated weights for policy 0, policy_version 49326 (0.0033) [2024-03-29 17:32:43,839][00126] Fps is (10 sec: 39322.8, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 808271872. Throughput: 0: 41655.6. Samples: 690523620. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:32:43,841][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:32:44,856][00497] Updated weights for policy 0, policy_version 49336 (0.0019) [2024-03-29 17:32:48,374][00497] Updated weights for policy 0, policy_version 49346 (0.0021) [2024-03-29 17:32:48,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 808501248. Throughput: 0: 41883.5. Samples: 690648700. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 17:32:48,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:32:50,193][00476] Signal inference workers to stop experience collection... (24600 times) [2024-03-29 17:32:50,233][00497] InferenceWorker_p0-w0: stopping experience collection (24600 times) [2024-03-29 17:32:50,415][00476] Signal inference workers to resume experience collection... (24600 times) [2024-03-29 17:32:50,415][00497] InferenceWorker_p0-w0: resuming experience collection (24600 times) [2024-03-29 17:32:52,405][00497] Updated weights for policy 0, policy_version 49356 (0.0036) [2024-03-29 17:32:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 808697856. Throughput: 0: 41633.8. Samples: 690885100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:32:53,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 17:32:56,333][00497] Updated weights for policy 0, policy_version 49366 (0.0023) [2024-03-29 17:32:58,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 808910848. Throughput: 0: 41744.9. Samples: 691150300. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:32:58,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:33:00,245][00497] Updated weights for policy 0, policy_version 49376 (0.0019) [2024-03-29 17:33:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 809123840. Throughput: 0: 41944.8. Samples: 691283700. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:33:03,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 17:33:03,907][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000049386_809140224.pth... [2024-03-29 17:33:03,926][00497] Updated weights for policy 0, policy_version 49386 (0.0022) [2024-03-29 17:33:04,235][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000048774_799113216.pth [2024-03-29 17:33:07,755][00497] Updated weights for policy 0, policy_version 49396 (0.0029) [2024-03-29 17:33:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 809336832. Throughput: 0: 41817.9. Samples: 691526760. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:33:08,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:33:11,798][00497] Updated weights for policy 0, policy_version 49406 (0.0022) [2024-03-29 17:33:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 809549824. Throughput: 0: 41771.6. Samples: 691781000. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:33:13,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 17:33:15,583][00497] Updated weights for policy 0, policy_version 49416 (0.0018) [2024-03-29 17:33:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 809762816. Throughput: 0: 42017.2. Samples: 691919180. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:33:18,841][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 17:33:19,355][00497] Updated weights for policy 0, policy_version 49426 (0.0020) [2024-03-29 17:33:23,046][00497] Updated weights for policy 0, policy_version 49436 (0.0020) [2024-03-29 17:33:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 809992192. Throughput: 0: 42116.8. Samples: 692164660. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 17:33:23,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 17:33:26,999][00476] Signal inference workers to stop experience collection... (24650 times) [2024-03-29 17:33:27,001][00476] Signal inference workers to resume experience collection... (24650 times) [2024-03-29 17:33:27,020][00497] Updated weights for policy 0, policy_version 49446 (0.0021) [2024-03-29 17:33:27,041][00497] InferenceWorker_p0-w0: stopping experience collection (24650 times) [2024-03-29 17:33:27,041][00497] InferenceWorker_p0-w0: resuming experience collection (24650 times) [2024-03-29 17:33:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 810188800. Throughput: 0: 42240.5. Samples: 692424440. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 17:33:28,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 17:33:31,051][00497] Updated weights for policy 0, policy_version 49456 (0.0023) [2024-03-29 17:33:33,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42325.6, 300 sec: 41709.8). Total num frames: 810418176. Throughput: 0: 42493.1. Samples: 692560880. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 17:33:33,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:33:34,440][00497] Updated weights for policy 0, policy_version 49466 (0.0019) [2024-03-29 17:33:38,236][00497] Updated weights for policy 0, policy_version 49476 (0.0027) [2024-03-29 17:33:38,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.5, 300 sec: 41932.0). Total num frames: 810647552. Throughput: 0: 42841.8. Samples: 692812980. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 17:33:38,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:33:42,237][00497] Updated weights for policy 0, policy_version 49486 (0.0020) [2024-03-29 17:33:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 810827776. Throughput: 0: 42579.1. Samples: 693066360. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 17:33:43,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:33:46,562][00497] Updated weights for policy 0, policy_version 49496 (0.0023) [2024-03-29 17:33:48,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 811040768. Throughput: 0: 42520.4. Samples: 693197120. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 17:33:48,842][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:33:50,030][00497] Updated weights for policy 0, policy_version 49506 (0.0021) [2024-03-29 17:33:53,449][00497] Updated weights for policy 0, policy_version 49516 (0.0027) [2024-03-29 17:33:53,839][00126] Fps is (10 sec: 45875.0, 60 sec: 43144.5, 300 sec: 41876.4). Total num frames: 811286528. Throughput: 0: 42792.9. Samples: 693452440. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 17:33:53,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 17:33:57,603][00497] Updated weights for policy 0, policy_version 49526 (0.0018) [2024-03-29 17:33:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42871.4, 300 sec: 41987.5). Total num frames: 811483136. Throughput: 0: 42861.7. Samples: 693709780. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 17:33:58,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 17:34:01,657][00497] Updated weights for policy 0, policy_version 49536 (0.0022) [2024-03-29 17:34:03,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 41820.9). Total num frames: 811679744. Throughput: 0: 42724.9. Samples: 693841800. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 17:34:03,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:34:04,919][00476] Signal inference workers to stop experience collection... (24700 times) [2024-03-29 17:34:04,955][00497] InferenceWorker_p0-w0: stopping experience collection (24700 times) [2024-03-29 17:34:05,144][00476] Signal inference workers to resume experience collection... (24700 times) [2024-03-29 17:34:05,144][00497] InferenceWorker_p0-w0: resuming experience collection (24700 times) [2024-03-29 17:34:05,398][00497] Updated weights for policy 0, policy_version 49546 (0.0022) [2024-03-29 17:34:08,839][00126] Fps is (10 sec: 44237.1, 60 sec: 43144.5, 300 sec: 41987.5). Total num frames: 811925504. Throughput: 0: 42751.7. Samples: 694088480. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 17:34:08,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 17:34:08,845][00497] Updated weights for policy 0, policy_version 49556 (0.0021) [2024-03-29 17:34:12,950][00497] Updated weights for policy 0, policy_version 49566 (0.0022) [2024-03-29 17:34:13,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42598.3, 300 sec: 41931.9). Total num frames: 812105728. Throughput: 0: 42619.8. Samples: 694342340. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 17:34:13,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:34:16,904][00497] Updated weights for policy 0, policy_version 49576 (0.0023) [2024-03-29 17:34:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 812318720. Throughput: 0: 42637.7. Samples: 694479580. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 17:34:18,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:34:20,615][00497] Updated weights for policy 0, policy_version 49586 (0.0020) [2024-03-29 17:34:23,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 812548096. Throughput: 0: 42674.2. Samples: 694733320. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 17:34:23,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:34:24,297][00497] Updated weights for policy 0, policy_version 49596 (0.0023) [2024-03-29 17:34:28,200][00497] Updated weights for policy 0, policy_version 49606 (0.0023) [2024-03-29 17:34:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42871.4, 300 sec: 41931.9). Total num frames: 812761088. Throughput: 0: 42495.5. Samples: 694978660. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:28,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 17:34:32,361][00497] Updated weights for policy 0, policy_version 49616 (0.0023) [2024-03-29 17:34:33,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 812941312. Throughput: 0: 42671.6. Samples: 695117340. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:33,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 17:34:35,921][00497] Updated weights for policy 0, policy_version 49626 (0.0022) [2024-03-29 17:34:38,404][00476] Signal inference workers to stop experience collection... (24750 times) [2024-03-29 17:34:38,476][00476] Signal inference workers to resume experience collection... (24750 times) [2024-03-29 17:34:38,476][00497] InferenceWorker_p0-w0: stopping experience collection (24750 times) [2024-03-29 17:34:38,506][00497] InferenceWorker_p0-w0: resuming experience collection (24750 times) [2024-03-29 17:34:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 813187072. Throughput: 0: 42710.1. Samples: 695374400. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:38,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 17:34:39,671][00497] Updated weights for policy 0, policy_version 49636 (0.0024) [2024-03-29 17:34:43,724][00497] Updated weights for policy 0, policy_version 49646 (0.0025) [2024-03-29 17:34:43,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42871.5, 300 sec: 41987.5). Total num frames: 813400064. Throughput: 0: 42365.8. Samples: 695616240. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:43,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:34:48,032][00497] Updated weights for policy 0, policy_version 49656 (0.0027) [2024-03-29 17:34:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 813596672. Throughput: 0: 42398.2. Samples: 695749720. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:48,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 17:34:51,633][00497] Updated weights for policy 0, policy_version 49666 (0.0029) [2024-03-29 17:34:53,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 813809664. Throughput: 0: 42427.9. Samples: 695997740. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:53,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 17:34:55,335][00497] Updated weights for policy 0, policy_version 49676 (0.0020) [2024-03-29 17:34:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 814039040. Throughput: 0: 42245.0. Samples: 696243360. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 17:34:58,842][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 17:34:59,385][00497] Updated weights for policy 0, policy_version 49686 (0.0018) [2024-03-29 17:35:03,544][00497] Updated weights for policy 0, policy_version 49696 (0.0017) [2024-03-29 17:35:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 814219264. Throughput: 0: 42302.2. Samples: 696383180. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:35:03,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 17:35:04,191][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000049698_814252032.pth... [2024-03-29 17:35:04,504][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000049081_804143104.pth [2024-03-29 17:35:07,228][00497] Updated weights for policy 0, policy_version 49706 (0.0018) [2024-03-29 17:35:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 814448640. Throughput: 0: 42442.1. Samples: 696643220. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:35:08,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 17:35:11,036][00497] Updated weights for policy 0, policy_version 49717 (0.0026) [2024-03-29 17:35:13,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 814661632. Throughput: 0: 42356.3. Samples: 696884700. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:35:13,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 17:35:15,272][00497] Updated weights for policy 0, policy_version 49727 (0.0020) [2024-03-29 17:35:18,810][00476] Signal inference workers to stop experience collection... (24800 times) [2024-03-29 17:35:18,811][00476] Signal inference workers to resume experience collection... (24800 times) [2024-03-29 17:35:18,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 814874624. Throughput: 0: 42260.8. Samples: 697019080. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:35:18,840][00126] Avg episode reward: [(0, '0.580')] [2024-03-29 17:35:18,846][00497] InferenceWorker_p0-w0: stopping experience collection (24800 times) [2024-03-29 17:35:18,846][00497] InferenceWorker_p0-w0: resuming experience collection (24800 times) [2024-03-29 17:35:19,123][00497] Updated weights for policy 0, policy_version 49737 (0.0017) [2024-03-29 17:35:22,621][00497] Updated weights for policy 0, policy_version 49747 (0.0024) [2024-03-29 17:35:23,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 815087616. Throughput: 0: 42574.3. Samples: 697290240. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:35:23,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 17:35:26,237][00497] Updated weights for policy 0, policy_version 49757 (0.0024) [2024-03-29 17:35:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 815316992. Throughput: 0: 42502.2. Samples: 697528840. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 17:35:28,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 17:35:30,637][00497] Updated weights for policy 0, policy_version 49767 (0.0021) [2024-03-29 17:35:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 815497216. Throughput: 0: 42436.0. Samples: 697659340. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 17:35:33,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 17:35:34,600][00497] Updated weights for policy 0, policy_version 49777 (0.0025) [2024-03-29 17:35:38,191][00497] Updated weights for policy 0, policy_version 49787 (0.0019) [2024-03-29 17:35:38,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 815726592. Throughput: 0: 42658.2. Samples: 697917360. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 17:35:38,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 17:35:42,306][00497] Updated weights for policy 0, policy_version 49797 (0.0025) [2024-03-29 17:35:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 815939584. Throughput: 0: 42513.4. Samples: 698156460. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 17:35:43,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:35:46,366][00497] Updated weights for policy 0, policy_version 49807 (0.0023) [2024-03-29 17:35:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.3, 300 sec: 42209.7). Total num frames: 816136192. Throughput: 0: 42047.6. Samples: 698275320. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 17:35:48,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 17:35:50,444][00497] Updated weights for policy 0, policy_version 49817 (0.0021) [2024-03-29 17:35:53,016][00476] Signal inference workers to stop experience collection... (24850 times) [2024-03-29 17:35:53,037][00497] InferenceWorker_p0-w0: stopping experience collection (24850 times) [2024-03-29 17:35:53,196][00476] Signal inference workers to resume experience collection... (24850 times) [2024-03-29 17:35:53,196][00497] InferenceWorker_p0-w0: resuming experience collection (24850 times) [2024-03-29 17:35:53,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 816349184. Throughput: 0: 42063.5. Samples: 698536080. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 17:35:53,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:35:54,169][00497] Updated weights for policy 0, policy_version 49827 (0.0023) [2024-03-29 17:35:57,819][00497] Updated weights for policy 0, policy_version 49837 (0.0028) [2024-03-29 17:35:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 816562176. Throughput: 0: 42043.6. Samples: 698776660. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 17:35:58,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 17:36:01,921][00497] Updated weights for policy 0, policy_version 49847 (0.0024) [2024-03-29 17:36:03,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 816775168. Throughput: 0: 41786.7. Samples: 698899480. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 17:36:03,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:36:06,104][00497] Updated weights for policy 0, policy_version 49857 (0.0022) [2024-03-29 17:36:08,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 816955392. Throughput: 0: 41705.4. Samples: 699166980. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 17:36:08,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 17:36:09,973][00497] Updated weights for policy 0, policy_version 49867 (0.0024) [2024-03-29 17:36:13,540][00497] Updated weights for policy 0, policy_version 49877 (0.0018) [2024-03-29 17:36:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.5, 300 sec: 42265.2). Total num frames: 817201152. Throughput: 0: 41885.4. Samples: 699413680. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 17:36:13,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 17:36:17,594][00497] Updated weights for policy 0, policy_version 49887 (0.0027) [2024-03-29 17:36:18,839][00126] Fps is (10 sec: 44235.9, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 817397760. Throughput: 0: 41565.2. Samples: 699529780. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 17:36:18,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:36:21,551][00497] Updated weights for policy 0, policy_version 49897 (0.0027) [2024-03-29 17:36:23,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 817594368. Throughput: 0: 41844.5. Samples: 699800360. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 17:36:23,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 17:36:25,556][00497] Updated weights for policy 0, policy_version 49907 (0.0018) [2024-03-29 17:36:26,919][00476] Signal inference workers to stop experience collection... (24900 times) [2024-03-29 17:36:26,998][00476] Signal inference workers to resume experience collection... (24900 times) [2024-03-29 17:36:27,000][00497] InferenceWorker_p0-w0: stopping experience collection (24900 times) [2024-03-29 17:36:27,025][00497] InferenceWorker_p0-w0: resuming experience collection (24900 times) [2024-03-29 17:36:28,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 817823744. Throughput: 0: 42144.5. Samples: 700052960. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 17:36:28,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 17:36:29,086][00497] Updated weights for policy 0, policy_version 49917 (0.0019) [2024-03-29 17:36:32,860][00497] Updated weights for policy 0, policy_version 49927 (0.0027) [2024-03-29 17:36:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.2, 300 sec: 42265.1). Total num frames: 818036736. Throughput: 0: 42079.4. Samples: 700168900. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:36:33,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 17:36:37,026][00497] Updated weights for policy 0, policy_version 49937 (0.0021) [2024-03-29 17:36:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.4, 300 sec: 42320.7). Total num frames: 818249728. Throughput: 0: 42301.5. Samples: 700439640. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:36:38,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:36:40,814][00497] Updated weights for policy 0, policy_version 49947 (0.0023) [2024-03-29 17:36:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42265.2). Total num frames: 818462720. Throughput: 0: 42531.5. Samples: 700690580. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:36:43,841][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 17:36:44,656][00497] Updated weights for policy 0, policy_version 49957 (0.0021) [2024-03-29 17:36:48,555][00497] Updated weights for policy 0, policy_version 49967 (0.0029) [2024-03-29 17:36:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 818659328. Throughput: 0: 42557.4. Samples: 700814560. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:36:48,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 17:36:52,715][00497] Updated weights for policy 0, policy_version 49977 (0.0018) [2024-03-29 17:36:53,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 818855936. Throughput: 0: 42294.6. Samples: 701070240. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:36:53,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 17:36:56,577][00497] Updated weights for policy 0, policy_version 49987 (0.0018) [2024-03-29 17:36:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 819085312. Throughput: 0: 42372.9. Samples: 701320460. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:36:58,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 17:37:00,203][00497] Updated weights for policy 0, policy_version 49997 (0.0020) [2024-03-29 17:37:02,091][00476] Signal inference workers to stop experience collection... (24950 times) [2024-03-29 17:37:02,129][00497] InferenceWorker_p0-w0: stopping experience collection (24950 times) [2024-03-29 17:37:02,320][00476] Signal inference workers to resume experience collection... (24950 times) [2024-03-29 17:37:02,320][00497] InferenceWorker_p0-w0: resuming experience collection (24950 times) [2024-03-29 17:37:03,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 819298304. Throughput: 0: 42332.8. Samples: 701434760. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 17:37:03,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 17:37:03,862][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050006_819298304.pth... [2024-03-29 17:37:04,174][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000049386_809140224.pth [2024-03-29 17:37:04,541][00497] Updated weights for policy 0, policy_version 50007 (0.0022) [2024-03-29 17:37:08,419][00497] Updated weights for policy 0, policy_version 50017 (0.0026) [2024-03-29 17:37:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 819478528. Throughput: 0: 41802.8. Samples: 701681480. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 17:37:08,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 17:37:12,616][00497] Updated weights for policy 0, policy_version 50027 (0.0018) [2024-03-29 17:37:13,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 819691520. Throughput: 0: 41794.6. Samples: 701933720. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 17:37:13,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 17:37:15,939][00497] Updated weights for policy 0, policy_version 50037 (0.0022) [2024-03-29 17:37:18,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 819937280. Throughput: 0: 41957.8. Samples: 702057000. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 17:37:18,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 17:37:20,009][00497] Updated weights for policy 0, policy_version 50047 (0.0027) [2024-03-29 17:37:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 820117504. Throughput: 0: 41531.5. Samples: 702308560. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 17:37:23,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:37:24,039][00497] Updated weights for policy 0, policy_version 50057 (0.0032) [2024-03-29 17:37:28,041][00497] Updated weights for policy 0, policy_version 50067 (0.0018) [2024-03-29 17:37:28,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41779.2, 300 sec: 42209.7). Total num frames: 820330496. Throughput: 0: 41866.9. Samples: 702574580. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 17:37:28,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:37:31,454][00497] Updated weights for policy 0, policy_version 50077 (0.0024) [2024-03-29 17:37:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 42320.7). Total num frames: 820559872. Throughput: 0: 41795.5. Samples: 702695360. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 17:37:33,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 17:37:35,473][00497] Updated weights for policy 0, policy_version 50087 (0.0021) [2024-03-29 17:37:38,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41779.1, 300 sec: 42320.7). Total num frames: 820756480. Throughput: 0: 41584.8. Samples: 702941560. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 17:37:38,842][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 17:37:39,568][00497] Updated weights for policy 0, policy_version 50097 (0.0024) [2024-03-29 17:37:40,662][00476] Signal inference workers to stop experience collection... (25000 times) [2024-03-29 17:37:40,663][00476] Signal inference workers to resume experience collection... (25000 times) [2024-03-29 17:37:40,703][00497] InferenceWorker_p0-w0: stopping experience collection (25000 times) [2024-03-29 17:37:40,704][00497] InferenceWorker_p0-w0: resuming experience collection (25000 times) [2024-03-29 17:37:43,600][00497] Updated weights for policy 0, policy_version 50107 (0.0022) [2024-03-29 17:37:43,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 820953088. Throughput: 0: 41716.9. Samples: 703197720. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 17:37:43,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 17:37:47,162][00497] Updated weights for policy 0, policy_version 50117 (0.0027) [2024-03-29 17:37:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 821166080. Throughput: 0: 41830.0. Samples: 703317100. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 17:37:48,840][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 17:37:51,301][00497] Updated weights for policy 0, policy_version 50127 (0.0036) [2024-03-29 17:37:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 821379072. Throughput: 0: 41759.6. Samples: 703560660. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 17:37:53,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:37:55,540][00497] Updated weights for policy 0, policy_version 50137 (0.0019) [2024-03-29 17:37:58,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.0, 300 sec: 42154.1). Total num frames: 821559296. Throughput: 0: 41973.7. Samples: 703822540. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 17:37:58,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 17:37:59,564][00497] Updated weights for policy 0, policy_version 50147 (0.0019) [2024-03-29 17:38:02,995][00497] Updated weights for policy 0, policy_version 50157 (0.0025) [2024-03-29 17:38:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.3, 300 sec: 42209.6). Total num frames: 821788672. Throughput: 0: 41761.9. Samples: 703936280. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 17:38:03,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:38:06,635][00497] Updated weights for policy 0, policy_version 50167 (0.0024) [2024-03-29 17:38:08,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 822001664. Throughput: 0: 41654.7. Samples: 704183020. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:08,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:38:11,312][00497] Updated weights for policy 0, policy_version 50177 (0.0022) [2024-03-29 17:38:13,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 822198272. Throughput: 0: 41500.3. Samples: 704442100. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:13,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 17:38:15,349][00497] Updated weights for policy 0, policy_version 50187 (0.0026) [2024-03-29 17:38:17,649][00476] Signal inference workers to stop experience collection... (25050 times) [2024-03-29 17:38:17,701][00497] InferenceWorker_p0-w0: stopping experience collection (25050 times) [2024-03-29 17:38:17,734][00476] Signal inference workers to resume experience collection... (25050 times) [2024-03-29 17:38:17,737][00497] InferenceWorker_p0-w0: resuming experience collection (25050 times) [2024-03-29 17:38:18,652][00497] Updated weights for policy 0, policy_version 50197 (0.0020) [2024-03-29 17:38:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 822427648. Throughput: 0: 41638.1. Samples: 704569080. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:18,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 17:38:22,273][00497] Updated weights for policy 0, policy_version 50207 (0.0030) [2024-03-29 17:38:23,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 822624256. Throughput: 0: 41488.1. Samples: 704808520. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:23,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 17:38:26,658][00497] Updated weights for policy 0, policy_version 50217 (0.0028) [2024-03-29 17:38:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.0, 300 sec: 42098.5). Total num frames: 822837248. Throughput: 0: 41700.3. Samples: 705074240. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:28,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 17:38:30,815][00497] Updated weights for policy 0, policy_version 50227 (0.0019) [2024-03-29 17:38:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 823050240. Throughput: 0: 41938.7. Samples: 705204340. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:33,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 17:38:34,284][00497] Updated weights for policy 0, policy_version 50237 (0.0021) [2024-03-29 17:38:37,880][00497] Updated weights for policy 0, policy_version 50247 (0.0032) [2024-03-29 17:38:38,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 823263232. Throughput: 0: 41714.6. Samples: 705437820. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 17:38:38,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 17:38:42,273][00497] Updated weights for policy 0, policy_version 50257 (0.0028) [2024-03-29 17:38:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 823459840. Throughput: 0: 41875.1. Samples: 705706920. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 17:38:43,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 17:38:46,224][00497] Updated weights for policy 0, policy_version 50267 (0.0021) [2024-03-29 17:38:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 823689216. Throughput: 0: 42272.0. Samples: 705838520. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 17:38:48,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 17:38:49,832][00497] Updated weights for policy 0, policy_version 50277 (0.0032) [2024-03-29 17:38:51,585][00476] Signal inference workers to stop experience collection... (25100 times) [2024-03-29 17:38:51,656][00497] InferenceWorker_p0-w0: stopping experience collection (25100 times) [2024-03-29 17:38:51,673][00476] Signal inference workers to resume experience collection... (25100 times) [2024-03-29 17:38:51,686][00497] InferenceWorker_p0-w0: resuming experience collection (25100 times) [2024-03-29 17:38:53,481][00497] Updated weights for policy 0, policy_version 50287 (0.0027) [2024-03-29 17:38:53,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 823902208. Throughput: 0: 41954.1. Samples: 706070960. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 17:38:53,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 17:38:57,864][00497] Updated weights for policy 0, policy_version 50297 (0.0023) [2024-03-29 17:38:58,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 824098816. Throughput: 0: 42207.2. Samples: 706341420. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 17:38:58,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:39:01,737][00497] Updated weights for policy 0, policy_version 50307 (0.0021) [2024-03-29 17:39:03,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 824328192. Throughput: 0: 42384.1. Samples: 706476360. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 17:39:03,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 17:39:03,857][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050313_824328192.pth... [2024-03-29 17:39:04,165][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000049698_814252032.pth [2024-03-29 17:39:05,264][00497] Updated weights for policy 0, policy_version 50317 (0.0021) [2024-03-29 17:39:08,630][00497] Updated weights for policy 0, policy_version 50327 (0.0023) [2024-03-29 17:39:08,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 824557568. Throughput: 0: 42355.5. Samples: 706714520. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 17:39:08,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 17:39:13,132][00497] Updated weights for policy 0, policy_version 50337 (0.0023) [2024-03-29 17:39:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.5, 300 sec: 42098.5). Total num frames: 824737792. Throughput: 0: 42307.3. Samples: 706978060. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 17:39:13,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 17:39:17,284][00497] Updated weights for policy 0, policy_version 50347 (0.0025) [2024-03-29 17:39:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 824950784. Throughput: 0: 42423.6. Samples: 707113400. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 17:39:18,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 17:39:20,850][00497] Updated weights for policy 0, policy_version 50357 (0.0023) [2024-03-29 17:39:23,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 825196544. Throughput: 0: 42597.7. Samples: 707354720. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 17:39:23,841][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:39:24,106][00497] Updated weights for policy 0, policy_version 50367 (0.0025) [2024-03-29 17:39:27,942][00476] Signal inference workers to stop experience collection... (25150 times) [2024-03-29 17:39:27,967][00497] InferenceWorker_p0-w0: stopping experience collection (25150 times) [2024-03-29 17:39:28,123][00476] Signal inference workers to resume experience collection... (25150 times) [2024-03-29 17:39:28,124][00497] InferenceWorker_p0-w0: resuming experience collection (25150 times) [2024-03-29 17:39:28,420][00497] Updated weights for policy 0, policy_version 50377 (0.0023) [2024-03-29 17:39:28,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 825376768. Throughput: 0: 42515.9. Samples: 707620140. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 17:39:28,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 17:39:32,524][00497] Updated weights for policy 0, policy_version 50387 (0.0029) [2024-03-29 17:39:33,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 825589760. Throughput: 0: 42487.9. Samples: 707750480. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 17:39:33,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:39:36,059][00497] Updated weights for policy 0, policy_version 50397 (0.0031) [2024-03-29 17:39:38,839][00126] Fps is (10 sec: 45876.1, 60 sec: 42871.6, 300 sec: 42154.1). Total num frames: 825835520. Throughput: 0: 42721.5. Samples: 707993420. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 17:39:38,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 17:39:39,489][00497] Updated weights for policy 0, policy_version 50407 (0.0023) [2024-03-29 17:39:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 825999360. Throughput: 0: 42229.5. Samples: 708241740. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:39:43,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:39:44,175][00497] Updated weights for policy 0, policy_version 50417 (0.0019) [2024-03-29 17:39:48,153][00497] Updated weights for policy 0, policy_version 50427 (0.0025) [2024-03-29 17:39:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 826228736. Throughput: 0: 42305.8. Samples: 708380120. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:39:48,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 17:39:51,699][00497] Updated weights for policy 0, policy_version 50437 (0.0036) [2024-03-29 17:39:53,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 826458112. Throughput: 0: 42475.6. Samples: 708625920. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:39:53,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:39:55,048][00497] Updated weights for policy 0, policy_version 50447 (0.0017) [2024-03-29 17:39:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 826654720. Throughput: 0: 42150.7. Samples: 708874840. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:39:58,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 17:39:59,823][00497] Updated weights for policy 0, policy_version 50457 (0.0027) [2024-03-29 17:40:00,250][00476] Signal inference workers to stop experience collection... (25200 times) [2024-03-29 17:40:00,292][00497] InferenceWorker_p0-w0: stopping experience collection (25200 times) [2024-03-29 17:40:00,447][00476] Signal inference workers to resume experience collection... (25200 times) [2024-03-29 17:40:00,447][00497] InferenceWorker_p0-w0: resuming experience collection (25200 times) [2024-03-29 17:40:03,543][00497] Updated weights for policy 0, policy_version 50467 (0.0022) [2024-03-29 17:40:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 826851328. Throughput: 0: 42078.2. Samples: 709006920. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:40:03,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:40:07,120][00497] Updated weights for policy 0, policy_version 50477 (0.0018) [2024-03-29 17:40:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 827097088. Throughput: 0: 42502.8. Samples: 709267340. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:40:08,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 17:40:10,377][00497] Updated weights for policy 0, policy_version 50487 (0.0029) [2024-03-29 17:40:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 827293696. Throughput: 0: 42150.4. Samples: 709516900. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 17:40:13,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 17:40:15,053][00497] Updated weights for policy 0, policy_version 50497 (0.0025) [2024-03-29 17:40:18,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 827490304. Throughput: 0: 42044.1. Samples: 709642460. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 17:40:18,841][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 17:40:19,087][00497] Updated weights for policy 0, policy_version 50507 (0.0025) [2024-03-29 17:40:22,668][00497] Updated weights for policy 0, policy_version 50517 (0.0020) [2024-03-29 17:40:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 827719680. Throughput: 0: 42429.3. Samples: 709902740. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 17:40:23,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 17:40:25,957][00497] Updated weights for policy 0, policy_version 50527 (0.0017) [2024-03-29 17:40:28,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 827932672. Throughput: 0: 42526.6. Samples: 710155440. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 17:40:28,840][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 17:40:30,117][00476] Signal inference workers to stop experience collection... (25250 times) [2024-03-29 17:40:30,194][00476] Signal inference workers to resume experience collection... (25250 times) [2024-03-29 17:40:30,196][00497] InferenceWorker_p0-w0: stopping experience collection (25250 times) [2024-03-29 17:40:30,220][00497] InferenceWorker_p0-w0: resuming experience collection (25250 times) [2024-03-29 17:40:30,505][00497] Updated weights for policy 0, policy_version 50537 (0.0020) [2024-03-29 17:40:33,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 828129280. Throughput: 0: 42289.2. Samples: 710283140. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 17:40:33,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 17:40:34,521][00497] Updated weights for policy 0, policy_version 50547 (0.0020) [2024-03-29 17:40:38,085][00497] Updated weights for policy 0, policy_version 50557 (0.0027) [2024-03-29 17:40:38,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 828358656. Throughput: 0: 42523.3. Samples: 710539480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 17:40:38,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 17:40:41,523][00497] Updated weights for policy 0, policy_version 50567 (0.0025) [2024-03-29 17:40:43,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 828571648. Throughput: 0: 42630.1. Samples: 710793200. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 17:40:43,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:40:45,902][00497] Updated weights for policy 0, policy_version 50577 (0.0023) [2024-03-29 17:40:48,839][00126] Fps is (10 sec: 39322.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 828751872. Throughput: 0: 42563.6. Samples: 710922280. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 17:40:48,841][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 17:40:49,813][00497] Updated weights for policy 0, policy_version 50587 (0.0020) [2024-03-29 17:40:53,696][00497] Updated weights for policy 0, policy_version 50597 (0.0024) [2024-03-29 17:40:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 828981248. Throughput: 0: 42304.5. Samples: 711171040. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 17:40:53,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:40:57,104][00497] Updated weights for policy 0, policy_version 50607 (0.0023) [2024-03-29 17:40:58,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 829194240. Throughput: 0: 42295.0. Samples: 711420180. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 17:40:58,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 17:41:01,858][00497] Updated weights for policy 0, policy_version 50617 (0.0032) [2024-03-29 17:41:03,283][00476] Signal inference workers to stop experience collection... (25300 times) [2024-03-29 17:41:03,327][00497] InferenceWorker_p0-w0: stopping experience collection (25300 times) [2024-03-29 17:41:03,363][00476] Signal inference workers to resume experience collection... (25300 times) [2024-03-29 17:41:03,369][00497] InferenceWorker_p0-w0: resuming experience collection (25300 times) [2024-03-29 17:41:03,839][00126] Fps is (10 sec: 40959.2, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 829390848. Throughput: 0: 42221.1. Samples: 711542420. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 17:41:03,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 17:41:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050622_829390848.pth... [2024-03-29 17:41:04,220][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050006_819298304.pth [2024-03-29 17:41:05,736][00497] Updated weights for policy 0, policy_version 50627 (0.0020) [2024-03-29 17:41:08,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 829603840. Throughput: 0: 42284.0. Samples: 711805520. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 17:41:08,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 17:41:09,204][00497] Updated weights for policy 0, policy_version 50637 (0.0032) [2024-03-29 17:41:12,692][00497] Updated weights for policy 0, policy_version 50647 (0.0024) [2024-03-29 17:41:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 829800448. Throughput: 0: 41714.2. Samples: 712032580. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 17:41:13,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:41:17,320][00497] Updated weights for policy 0, policy_version 50657 (0.0027) [2024-03-29 17:41:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 830013440. Throughput: 0: 41959.7. Samples: 712171320. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:18,840][00126] Avg episode reward: [(0, '0.625')] [2024-03-29 17:41:21,204][00497] Updated weights for policy 0, policy_version 50667 (0.0023) [2024-03-29 17:41:23,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 830242816. Throughput: 0: 42106.4. Samples: 712434260. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:23,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:41:24,568][00497] Updated weights for policy 0, policy_version 50677 (0.0023) [2024-03-29 17:41:28,264][00497] Updated weights for policy 0, policy_version 50687 (0.0020) [2024-03-29 17:41:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 830455808. Throughput: 0: 41826.3. Samples: 712675380. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:28,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:41:32,635][00497] Updated weights for policy 0, policy_version 50697 (0.0017) [2024-03-29 17:41:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 830668800. Throughput: 0: 41952.9. Samples: 712810160. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:33,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 17:41:36,568][00497] Updated weights for policy 0, policy_version 50707 (0.0023) [2024-03-29 17:41:38,322][00476] Signal inference workers to stop experience collection... (25350 times) [2024-03-29 17:41:38,399][00497] InferenceWorker_p0-w0: stopping experience collection (25350 times) [2024-03-29 17:41:38,493][00476] Signal inference workers to resume experience collection... (25350 times) [2024-03-29 17:41:38,493][00497] InferenceWorker_p0-w0: resuming experience collection (25350 times) [2024-03-29 17:41:38,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 830881792. Throughput: 0: 42270.5. Samples: 713073220. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:38,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:41:39,930][00497] Updated weights for policy 0, policy_version 50717 (0.0023) [2024-03-29 17:41:43,647][00497] Updated weights for policy 0, policy_version 50727 (0.0023) [2024-03-29 17:41:43,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 831111168. Throughput: 0: 42015.5. Samples: 713310880. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:43,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:41:48,084][00497] Updated weights for policy 0, policy_version 50737 (0.0028) [2024-03-29 17:41:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 831291392. Throughput: 0: 42304.9. Samples: 713446140. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 17:41:48,840][00126] Avg episode reward: [(0, '0.634')] [2024-03-29 17:41:51,976][00497] Updated weights for policy 0, policy_version 50747 (0.0025) [2024-03-29 17:41:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 831504384. Throughput: 0: 42286.0. Samples: 713708400. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 17:41:53,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 17:41:55,371][00497] Updated weights for policy 0, policy_version 50757 (0.0026) [2024-03-29 17:41:58,839][00126] Fps is (10 sec: 45876.0, 60 sec: 42598.5, 300 sec: 42209.7). Total num frames: 831750144. Throughput: 0: 42833.0. Samples: 713960060. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 17:41:58,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:41:58,871][00497] Updated weights for policy 0, policy_version 50767 (0.0021) [2024-03-29 17:42:03,507][00497] Updated weights for policy 0, policy_version 50777 (0.0021) [2024-03-29 17:42:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42265.1). Total num frames: 831946752. Throughput: 0: 42730.9. Samples: 714094220. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 17:42:03,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 17:42:07,316][00497] Updated weights for policy 0, policy_version 50787 (0.0025) [2024-03-29 17:42:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 832143360. Throughput: 0: 42842.7. Samples: 714362180. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 17:42:08,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:42:09,508][00476] Signal inference workers to stop experience collection... (25400 times) [2024-03-29 17:42:09,562][00497] InferenceWorker_p0-w0: stopping experience collection (25400 times) [2024-03-29 17:42:09,664][00476] Signal inference workers to resume experience collection... (25400 times) [2024-03-29 17:42:09,664][00497] InferenceWorker_p0-w0: resuming experience collection (25400 times) [2024-03-29 17:42:10,554][00497] Updated weights for policy 0, policy_version 50797 (0.0026) [2024-03-29 17:42:13,839][00126] Fps is (10 sec: 44237.7, 60 sec: 43144.6, 300 sec: 42209.6). Total num frames: 832389120. Throughput: 0: 42672.0. Samples: 714595620. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 17:42:13,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 17:42:14,628][00497] Updated weights for policy 0, policy_version 50807 (0.0039) [2024-03-29 17:42:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 832569344. Throughput: 0: 42532.0. Samples: 714724100. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 17:42:18,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:42:18,888][00497] Updated weights for policy 0, policy_version 50817 (0.0018) [2024-03-29 17:42:22,850][00497] Updated weights for policy 0, policy_version 50827 (0.0023) [2024-03-29 17:42:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 832765952. Throughput: 0: 42594.7. Samples: 714989980. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:23,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 17:42:26,281][00497] Updated weights for policy 0, policy_version 50837 (0.0021) [2024-03-29 17:42:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 833011712. Throughput: 0: 42585.5. Samples: 715227220. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:28,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 17:42:30,049][00497] Updated weights for policy 0, policy_version 50847 (0.0028) [2024-03-29 17:42:33,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 833208320. Throughput: 0: 42536.6. Samples: 715360280. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:33,841][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 17:42:34,406][00497] Updated weights for policy 0, policy_version 50857 (0.0023) [2024-03-29 17:42:38,498][00497] Updated weights for policy 0, policy_version 50867 (0.0025) [2024-03-29 17:42:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 833421312. Throughput: 0: 42719.7. Samples: 715630780. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:38,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:42:42,006][00476] Signal inference workers to stop experience collection... (25450 times) [2024-03-29 17:42:42,007][00476] Signal inference workers to resume experience collection... (25450 times) [2024-03-29 17:42:42,008][00497] Updated weights for policy 0, policy_version 50877 (0.0026) [2024-03-29 17:42:42,052][00497] InferenceWorker_p0-w0: stopping experience collection (25450 times) [2024-03-29 17:42:42,052][00497] InferenceWorker_p0-w0: resuming experience collection (25450 times) [2024-03-29 17:42:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 833650688. Throughput: 0: 42195.9. Samples: 715858880. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:43,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 17:42:45,779][00497] Updated weights for policy 0, policy_version 50887 (0.0026) [2024-03-29 17:42:48,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.5, 300 sec: 42265.2). Total num frames: 833847296. Throughput: 0: 41978.0. Samples: 715983220. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:48,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 17:42:50,056][00497] Updated weights for policy 0, policy_version 50897 (0.0026) [2024-03-29 17:42:53,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 834027520. Throughput: 0: 41795.8. Samples: 716243000. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 17:42:53,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 17:42:54,296][00497] Updated weights for policy 0, policy_version 50907 (0.0023) [2024-03-29 17:42:57,594][00497] Updated weights for policy 0, policy_version 50917 (0.0024) [2024-03-29 17:42:58,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 834273280. Throughput: 0: 42036.3. Samples: 716487260. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:42:58,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 17:43:01,342][00497] Updated weights for policy 0, policy_version 50927 (0.0022) [2024-03-29 17:43:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 42265.1). Total num frames: 834469888. Throughput: 0: 42034.5. Samples: 716615660. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:43:03,842][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 17:43:04,079][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050933_834486272.pth... [2024-03-29 17:43:04,395][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050313_824328192.pth [2024-03-29 17:43:05,763][00497] Updated weights for policy 0, policy_version 50937 (0.0025) [2024-03-29 17:43:08,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41779.2, 300 sec: 42209.7). Total num frames: 834650112. Throughput: 0: 41893.9. Samples: 716875200. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:43:08,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 17:43:09,873][00497] Updated weights for policy 0, policy_version 50947 (0.0027) [2024-03-29 17:43:13,079][00497] Updated weights for policy 0, policy_version 50957 (0.0020) [2024-03-29 17:43:13,083][00476] Signal inference workers to stop experience collection... (25500 times) [2024-03-29 17:43:13,084][00476] Signal inference workers to resume experience collection... (25500 times) [2024-03-29 17:43:13,131][00497] InferenceWorker_p0-w0: stopping experience collection (25500 times) [2024-03-29 17:43:13,131][00497] InferenceWorker_p0-w0: resuming experience collection (25500 times) [2024-03-29 17:43:13,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 834912256. Throughput: 0: 42078.6. Samples: 717120760. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:43:13,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 17:43:16,987][00497] Updated weights for policy 0, policy_version 50967 (0.0025) [2024-03-29 17:43:18,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 835108864. Throughput: 0: 41964.0. Samples: 717248660. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:43:18,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:43:20,923][00497] Updated weights for policy 0, policy_version 50977 (0.0024) [2024-03-29 17:43:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 835305472. Throughput: 0: 42008.0. Samples: 717521140. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:43:23,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:43:24,907][00497] Updated weights for policy 0, policy_version 50987 (0.0030) [2024-03-29 17:43:28,279][00497] Updated weights for policy 0, policy_version 50997 (0.0023) [2024-03-29 17:43:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 835551232. Throughput: 0: 42344.1. Samples: 717764360. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 17:43:28,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 17:43:32,252][00497] Updated weights for policy 0, policy_version 51007 (0.0026) [2024-03-29 17:43:33,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.2, 300 sec: 42320.7). Total num frames: 835747840. Throughput: 0: 42279.3. Samples: 717885800. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 17:43:33,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 17:43:36,285][00497] Updated weights for policy 0, policy_version 51017 (0.0029) [2024-03-29 17:43:38,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 835944448. Throughput: 0: 42678.4. Samples: 718163520. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 17:43:38,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 17:43:40,313][00497] Updated weights for policy 0, policy_version 51027 (0.0019) [2024-03-29 17:43:43,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 836173824. Throughput: 0: 42769.4. Samples: 718411880. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 17:43:43,840][00126] Avg episode reward: [(0, '0.491')] [2024-03-29 17:43:43,854][00497] Updated weights for policy 0, policy_version 51037 (0.0027) [2024-03-29 17:43:47,239][00476] Signal inference workers to stop experience collection... (25550 times) [2024-03-29 17:43:47,240][00476] Signal inference workers to resume experience collection... (25550 times) [2024-03-29 17:43:47,289][00497] InferenceWorker_p0-w0: stopping experience collection (25550 times) [2024-03-29 17:43:47,289][00497] InferenceWorker_p0-w0: resuming experience collection (25550 times) [2024-03-29 17:43:47,547][00497] Updated weights for policy 0, policy_version 51047 (0.0027) [2024-03-29 17:43:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 836386816. Throughput: 0: 42289.5. Samples: 718518680. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 17:43:48,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 17:43:51,971][00497] Updated weights for policy 0, policy_version 51057 (0.0030) [2024-03-29 17:43:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 836583424. Throughput: 0: 42575.5. Samples: 718791100. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 17:43:53,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 17:43:56,110][00497] Updated weights for policy 0, policy_version 51067 (0.0020) [2024-03-29 17:43:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 836812800. Throughput: 0: 42980.5. Samples: 719054880. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:43:58,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 17:43:59,225][00497] Updated weights for policy 0, policy_version 51077 (0.0024) [2024-03-29 17:44:02,974][00497] Updated weights for policy 0, policy_version 51087 (0.0021) [2024-03-29 17:44:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.5, 300 sec: 42265.2). Total num frames: 837025792. Throughput: 0: 42299.1. Samples: 719152120. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:44:03,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 17:44:07,388][00497] Updated weights for policy 0, policy_version 51097 (0.0018) [2024-03-29 17:44:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 42376.2). Total num frames: 837238784. Throughput: 0: 42474.3. Samples: 719432480. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:44:08,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 17:44:11,445][00497] Updated weights for policy 0, policy_version 51107 (0.0022) [2024-03-29 17:44:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 837451776. Throughput: 0: 42821.3. Samples: 719691320. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:44:13,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 17:44:14,635][00497] Updated weights for policy 0, policy_version 51117 (0.0018) [2024-03-29 17:44:18,436][00497] Updated weights for policy 0, policy_version 51127 (0.0018) [2024-03-29 17:44:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 837664768. Throughput: 0: 42314.0. Samples: 719789920. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:44:18,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 17:44:21,017][00476] Signal inference workers to stop experience collection... (25600 times) [2024-03-29 17:44:21,092][00476] Signal inference workers to resume experience collection... (25600 times) [2024-03-29 17:44:21,090][00497] InferenceWorker_p0-w0: stopping experience collection (25600 times) [2024-03-29 17:44:21,118][00497] InferenceWorker_p0-w0: resuming experience collection (25600 times) [2024-03-29 17:44:22,702][00497] Updated weights for policy 0, policy_version 51137 (0.0022) [2024-03-29 17:44:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42871.5, 300 sec: 42376.3). Total num frames: 837877760. Throughput: 0: 42433.3. Samples: 720073020. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:44:23,840][00126] Avg episode reward: [(0, '0.639')] [2024-03-29 17:44:26,583][00497] Updated weights for policy 0, policy_version 51147 (0.0025) [2024-03-29 17:44:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 838074368. Throughput: 0: 42582.7. Samples: 720328100. Policy #0 lag: (min: 2.0, avg: 20.9, max: 43.0) [2024-03-29 17:44:28,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 17:44:30,150][00497] Updated weights for policy 0, policy_version 51157 (0.0022) [2024-03-29 17:44:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.5, 300 sec: 42265.2). Total num frames: 838303744. Throughput: 0: 42514.7. Samples: 720431840. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:44:33,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 17:44:33,961][00497] Updated weights for policy 0, policy_version 51167 (0.0017) [2024-03-29 17:44:38,514][00497] Updated weights for policy 0, policy_version 51177 (0.0022) [2024-03-29 17:44:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 838483968. Throughput: 0: 42501.3. Samples: 720703660. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:44:38,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 17:44:42,411][00497] Updated weights for policy 0, policy_version 51187 (0.0029) [2024-03-29 17:44:43,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 838680576. Throughput: 0: 42063.1. Samples: 720947720. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:44:43,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 17:44:45,966][00497] Updated weights for policy 0, policy_version 51197 (0.0024) [2024-03-29 17:44:48,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 838942720. Throughput: 0: 42347.1. Samples: 721057740. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:44:48,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 17:44:49,666][00497] Updated weights for policy 0, policy_version 51207 (0.0026) [2024-03-29 17:44:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 839106560. Throughput: 0: 42075.5. Samples: 721325880. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:44:53,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 17:44:54,318][00497] Updated weights for policy 0, policy_version 51217 (0.0023) [2024-03-29 17:44:54,681][00476] Signal inference workers to stop experience collection... (25650 times) [2024-03-29 17:44:54,714][00497] InferenceWorker_p0-w0: stopping experience collection (25650 times) [2024-03-29 17:44:54,899][00476] Signal inference workers to resume experience collection... (25650 times) [2024-03-29 17:44:54,899][00497] InferenceWorker_p0-w0: resuming experience collection (25650 times) [2024-03-29 17:44:58,092][00497] Updated weights for policy 0, policy_version 51227 (0.0021) [2024-03-29 17:44:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.1, 300 sec: 42265.2). Total num frames: 839319552. Throughput: 0: 41948.0. Samples: 721578980. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 17:44:58,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 17:45:01,670][00497] Updated weights for policy 0, policy_version 51237 (0.0031) [2024-03-29 17:45:03,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 839565312. Throughput: 0: 42331.1. Samples: 721694820. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:03,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 17:45:04,090][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000051244_839581696.pth... [2024-03-29 17:45:04,426][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050622_829390848.pth [2024-03-29 17:45:05,652][00497] Updated weights for policy 0, policy_version 51247 (0.0032) [2024-03-29 17:45:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 839729152. Throughput: 0: 41375.1. Samples: 721934900. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:08,841][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 17:45:10,017][00497] Updated weights for policy 0, policy_version 51257 (0.0022) [2024-03-29 17:45:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 839942144. Throughput: 0: 41492.0. Samples: 722195240. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:13,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 17:45:14,068][00497] Updated weights for policy 0, policy_version 51267 (0.0019) [2024-03-29 17:45:17,439][00497] Updated weights for policy 0, policy_version 51277 (0.0023) [2024-03-29 17:45:18,839][00126] Fps is (10 sec: 47513.3, 60 sec: 42325.2, 300 sec: 42320.7). Total num frames: 840204288. Throughput: 0: 42029.7. Samples: 722323180. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:18,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:45:21,290][00497] Updated weights for policy 0, policy_version 51287 (0.0021) [2024-03-29 17:45:23,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41233.0, 300 sec: 42098.5). Total num frames: 840351744. Throughput: 0: 41222.1. Samples: 722558660. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:23,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:45:25,737][00497] Updated weights for policy 0, policy_version 51297 (0.0023) [2024-03-29 17:45:28,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 840581120. Throughput: 0: 41777.7. Samples: 722827720. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:28,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:45:29,249][00476] Signal inference workers to stop experience collection... (25700 times) [2024-03-29 17:45:29,304][00497] InferenceWorker_p0-w0: stopping experience collection (25700 times) [2024-03-29 17:45:29,338][00476] Signal inference workers to resume experience collection... (25700 times) [2024-03-29 17:45:29,339][00497] InferenceWorker_p0-w0: resuming experience collection (25700 times) [2024-03-29 17:45:29,672][00497] Updated weights for policy 0, policy_version 51307 (0.0025) [2024-03-29 17:45:33,069][00497] Updated weights for policy 0, policy_version 51317 (0.0028) [2024-03-29 17:45:33,839][00126] Fps is (10 sec: 45876.0, 60 sec: 41779.2, 300 sec: 42209.7). Total num frames: 840810496. Throughput: 0: 42115.6. Samples: 722952940. Policy #0 lag: (min: 3.0, avg: 23.6, max: 42.0) [2024-03-29 17:45:33,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:45:36,862][00497] Updated weights for policy 0, policy_version 51327 (0.0022) [2024-03-29 17:45:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 840990720. Throughput: 0: 41296.8. Samples: 723184240. Policy #0 lag: (min: 1.0, avg: 23.3, max: 40.0) [2024-03-29 17:45:38,842][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 17:45:41,454][00497] Updated weights for policy 0, policy_version 51337 (0.0027) [2024-03-29 17:45:43,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 841203712. Throughput: 0: 41798.3. Samples: 723459900. Policy #0 lag: (min: 1.0, avg: 23.3, max: 40.0) [2024-03-29 17:45:43,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 17:45:45,031][00497] Updated weights for policy 0, policy_version 51347 (0.0018) [2024-03-29 17:45:48,697][00497] Updated weights for policy 0, policy_version 51357 (0.0017) [2024-03-29 17:45:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 841433088. Throughput: 0: 42018.2. Samples: 723585640. Policy #0 lag: (min: 1.0, avg: 23.3, max: 40.0) [2024-03-29 17:45:48,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 17:45:52,478][00497] Updated weights for policy 0, policy_version 51367 (0.0029) [2024-03-29 17:45:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 841646080. Throughput: 0: 41793.4. Samples: 723815600. Policy #0 lag: (min: 1.0, avg: 23.3, max: 40.0) [2024-03-29 17:45:53,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 17:45:56,920][00497] Updated weights for policy 0, policy_version 51377 (0.0021) [2024-03-29 17:45:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 841826304. Throughput: 0: 42116.0. Samples: 724090460. Policy #0 lag: (min: 1.0, avg: 23.3, max: 40.0) [2024-03-29 17:45:58,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 17:45:59,489][00476] Signal inference workers to stop experience collection... (25750 times) [2024-03-29 17:45:59,489][00476] Signal inference workers to resume experience collection... (25750 times) [2024-03-29 17:45:59,533][00497] InferenceWorker_p0-w0: stopping experience collection (25750 times) [2024-03-29 17:45:59,533][00497] InferenceWorker_p0-w0: resuming experience collection (25750 times) [2024-03-29 17:46:00,681][00497] Updated weights for policy 0, policy_version 51387 (0.0020) [2024-03-29 17:46:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 42265.1). Total num frames: 842072064. Throughput: 0: 42240.9. Samples: 724224020. Policy #0 lag: (min: 1.0, avg: 23.3, max: 40.0) [2024-03-29 17:46:03,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:46:04,063][00497] Updated weights for policy 0, policy_version 51397 (0.0037) [2024-03-29 17:46:07,821][00497] Updated weights for policy 0, policy_version 51407 (0.0023) [2024-03-29 17:46:08,839][00126] Fps is (10 sec: 47513.0, 60 sec: 42871.4, 300 sec: 42376.2). Total num frames: 842301440. Throughput: 0: 42001.4. Samples: 724448720. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:08,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 17:46:12,372][00497] Updated weights for policy 0, policy_version 51417 (0.0025) [2024-03-29 17:46:13,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 842465280. Throughput: 0: 42023.2. Samples: 724718760. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:13,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 17:46:16,524][00497] Updated weights for policy 0, policy_version 51427 (0.0024) [2024-03-29 17:46:18,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 842694656. Throughput: 0: 42241.8. Samples: 724853820. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:18,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 17:46:19,637][00497] Updated weights for policy 0, policy_version 51437 (0.0035) [2024-03-29 17:46:23,482][00497] Updated weights for policy 0, policy_version 51447 (0.0029) [2024-03-29 17:46:23,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 842907648. Throughput: 0: 41853.5. Samples: 725067640. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:23,840][00126] Avg episode reward: [(0, '0.507')] [2024-03-29 17:46:28,112][00497] Updated weights for policy 0, policy_version 51457 (0.0024) [2024-03-29 17:46:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 843104256. Throughput: 0: 41968.0. Samples: 725348460. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:28,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:46:32,092][00497] Updated weights for policy 0, policy_version 51467 (0.0020) [2024-03-29 17:46:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 843317248. Throughput: 0: 41981.8. Samples: 725474820. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:33,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 17:46:34,633][00476] Signal inference workers to stop experience collection... (25800 times) [2024-03-29 17:46:34,705][00497] InferenceWorker_p0-w0: stopping experience collection (25800 times) [2024-03-29 17:46:34,707][00476] Signal inference workers to resume experience collection... (25800 times) [2024-03-29 17:46:34,730][00497] InferenceWorker_p0-w0: resuming experience collection (25800 times) [2024-03-29 17:46:35,261][00497] Updated weights for policy 0, policy_version 51477 (0.0026) [2024-03-29 17:46:38,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 843546624. Throughput: 0: 41912.5. Samples: 725701660. Policy #0 lag: (min: 0.0, avg: 24.3, max: 42.0) [2024-03-29 17:46:38,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 17:46:39,068][00497] Updated weights for policy 0, policy_version 51487 (0.0018) [2024-03-29 17:46:43,707][00497] Updated weights for policy 0, policy_version 51497 (0.0022) [2024-03-29 17:46:43,841][00126] Fps is (10 sec: 40952.5, 60 sec: 42051.0, 300 sec: 42153.9). Total num frames: 843726848. Throughput: 0: 42133.8. Samples: 725986560. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 17:46:43,841][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 17:46:47,646][00497] Updated weights for policy 0, policy_version 51507 (0.0027) [2024-03-29 17:46:48,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 843939840. Throughput: 0: 41780.9. Samples: 726104160. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 17:46:48,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 17:46:50,969][00497] Updated weights for policy 0, policy_version 51517 (0.0026) [2024-03-29 17:46:53,839][00126] Fps is (10 sec: 45882.7, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 844185600. Throughput: 0: 41891.5. Samples: 726333840. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 17:46:53,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:46:54,768][00497] Updated weights for policy 0, policy_version 51527 (0.0033) [2024-03-29 17:46:58,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 844333056. Throughput: 0: 41736.0. Samples: 726596880. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 17:46:58,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 17:46:59,499][00497] Updated weights for policy 0, policy_version 51537 (0.0026) [2024-03-29 17:47:03,604][00497] Updated weights for policy 0, policy_version 51547 (0.0021) [2024-03-29 17:47:03,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41233.1, 300 sec: 42043.0). Total num frames: 844546048. Throughput: 0: 41728.8. Samples: 726731620. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 17:47:03,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 17:47:03,886][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000051548_844562432.pth... [2024-03-29 17:47:04,197][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000050933_834486272.pth [2024-03-29 17:47:06,043][00476] Signal inference workers to stop experience collection... (25850 times) [2024-03-29 17:47:06,077][00497] InferenceWorker_p0-w0: stopping experience collection (25850 times) [2024-03-29 17:47:06,264][00476] Signal inference workers to resume experience collection... (25850 times) [2024-03-29 17:47:06,265][00497] InferenceWorker_p0-w0: resuming experience collection (25850 times) [2024-03-29 17:47:06,770][00497] Updated weights for policy 0, policy_version 51557 (0.0024) [2024-03-29 17:47:08,839][00126] Fps is (10 sec: 47513.3, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 844808192. Throughput: 0: 41976.9. Samples: 726956600. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 17:47:08,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 17:47:10,384][00497] Updated weights for policy 0, policy_version 51567 (0.0028) [2024-03-29 17:47:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.0, 300 sec: 41987.4). Total num frames: 844955648. Throughput: 0: 41507.0. Samples: 727216280. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:13,842][00126] Avg episode reward: [(0, '0.640')] [2024-03-29 17:47:15,140][00497] Updated weights for policy 0, policy_version 51577 (0.0023) [2024-03-29 17:47:18,839][00126] Fps is (10 sec: 36044.9, 60 sec: 41233.1, 300 sec: 42043.0). Total num frames: 845168640. Throughput: 0: 41495.5. Samples: 727342120. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:18,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 17:47:19,455][00497] Updated weights for policy 0, policy_version 51587 (0.0019) [2024-03-29 17:47:22,562][00497] Updated weights for policy 0, policy_version 51597 (0.0027) [2024-03-29 17:47:23,839][00126] Fps is (10 sec: 47514.3, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 845430784. Throughput: 0: 42044.4. Samples: 727593660. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:23,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:47:26,099][00497] Updated weights for policy 0, policy_version 51607 (0.0019) [2024-03-29 17:47:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 845594624. Throughput: 0: 41192.8. Samples: 727840160. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:28,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 17:47:30,810][00497] Updated weights for policy 0, policy_version 51617 (0.0025) [2024-03-29 17:47:33,839][00126] Fps is (10 sec: 37682.6, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 845807616. Throughput: 0: 41483.1. Samples: 727970900. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:33,840][00126] Avg episode reward: [(0, '0.506')] [2024-03-29 17:47:34,878][00497] Updated weights for policy 0, policy_version 51627 (0.0018) [2024-03-29 17:47:37,659][00476] Signal inference workers to stop experience collection... (25900 times) [2024-03-29 17:47:37,684][00497] InferenceWorker_p0-w0: stopping experience collection (25900 times) [2024-03-29 17:47:37,883][00476] Signal inference workers to resume experience collection... (25900 times) [2024-03-29 17:47:37,883][00497] InferenceWorker_p0-w0: resuming experience collection (25900 times) [2024-03-29 17:47:38,197][00497] Updated weights for policy 0, policy_version 51637 (0.0025) [2024-03-29 17:47:38,839][00126] Fps is (10 sec: 45874.5, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 846053376. Throughput: 0: 42021.4. Samples: 728224800. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:38,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 17:47:41,884][00497] Updated weights for policy 0, policy_version 51647 (0.0021) [2024-03-29 17:47:43,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41780.5, 300 sec: 41987.5). Total num frames: 846233600. Throughput: 0: 41370.2. Samples: 728458540. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 17:47:43,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 17:47:46,527][00497] Updated weights for policy 0, policy_version 51657 (0.0025) [2024-03-29 17:47:48,839][00126] Fps is (10 sec: 37683.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 846430208. Throughput: 0: 41515.0. Samples: 728599800. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:47:48,842][00126] Avg episode reward: [(0, '0.653')] [2024-03-29 17:47:50,699][00497] Updated weights for policy 0, policy_version 51667 (0.0027) [2024-03-29 17:47:53,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 846675968. Throughput: 0: 42271.5. Samples: 728858820. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:47:53,840][00126] Avg episode reward: [(0, '0.619')] [2024-03-29 17:47:53,840][00497] Updated weights for policy 0, policy_version 51677 (0.0030) [2024-03-29 17:47:57,571][00497] Updated weights for policy 0, policy_version 51687 (0.0021) [2024-03-29 17:47:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 846872576. Throughput: 0: 41325.0. Samples: 729075900. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:47:58,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 17:48:02,261][00497] Updated weights for policy 0, policy_version 51697 (0.0019) [2024-03-29 17:48:03,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 847052800. Throughput: 0: 41824.4. Samples: 729224220. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:48:03,840][00126] Avg episode reward: [(0, '0.633')] [2024-03-29 17:48:06,591][00497] Updated weights for policy 0, policy_version 51707 (0.0022) [2024-03-29 17:48:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 847282176. Throughput: 0: 42061.4. Samples: 729486420. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:48:08,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:48:09,456][00497] Updated weights for policy 0, policy_version 51717 (0.0026) [2024-03-29 17:48:10,267][00476] Signal inference workers to stop experience collection... (25950 times) [2024-03-29 17:48:10,338][00497] InferenceWorker_p0-w0: stopping experience collection (25950 times) [2024-03-29 17:48:10,341][00476] Signal inference workers to resume experience collection... (25950 times) [2024-03-29 17:48:10,365][00497] InferenceWorker_p0-w0: resuming experience collection (25950 times) [2024-03-29 17:48:13,248][00497] Updated weights for policy 0, policy_version 51727 (0.0018) [2024-03-29 17:48:13,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 847511552. Throughput: 0: 41438.5. Samples: 729704900. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 17:48:13,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:48:17,874][00497] Updated weights for policy 0, policy_version 51737 (0.0019) [2024-03-29 17:48:18,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 847691776. Throughput: 0: 41797.3. Samples: 729851780. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:18,840][00126] Avg episode reward: [(0, '0.646')] [2024-03-29 17:48:22,176][00497] Updated weights for policy 0, policy_version 51747 (0.0019) [2024-03-29 17:48:23,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 847904768. Throughput: 0: 41958.3. Samples: 730112920. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:23,841][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 17:48:25,191][00497] Updated weights for policy 0, policy_version 51757 (0.0022) [2024-03-29 17:48:28,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 848134144. Throughput: 0: 41617.7. Samples: 730331340. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:28,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:48:29,015][00497] Updated weights for policy 0, policy_version 51767 (0.0030) [2024-03-29 17:48:33,587][00497] Updated weights for policy 0, policy_version 51777 (0.0026) [2024-03-29 17:48:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 848314368. Throughput: 0: 41716.9. Samples: 730477060. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:33,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 17:48:37,657][00497] Updated weights for policy 0, policy_version 51787 (0.0017) [2024-03-29 17:48:38,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 848527360. Throughput: 0: 41850.7. Samples: 730742100. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:38,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 17:48:40,814][00497] Updated weights for policy 0, policy_version 51797 (0.0026) [2024-03-29 17:48:43,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 848773120. Throughput: 0: 41927.0. Samples: 730962620. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:43,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 17:48:44,557][00497] Updated weights for policy 0, policy_version 51807 (0.0026) [2024-03-29 17:48:47,705][00476] Signal inference workers to stop experience collection... (26000 times) [2024-03-29 17:48:47,774][00497] InferenceWorker_p0-w0: stopping experience collection (26000 times) [2024-03-29 17:48:47,778][00476] Signal inference workers to resume experience collection... (26000 times) [2024-03-29 17:48:47,803][00497] InferenceWorker_p0-w0: resuming experience collection (26000 times) [2024-03-29 17:48:48,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 848936960. Throughput: 0: 41547.9. Samples: 731093880. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 17:48:48,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 17:48:49,301][00497] Updated weights for policy 0, policy_version 51817 (0.0032) [2024-03-29 17:48:53,376][00497] Updated weights for policy 0, policy_version 51827 (0.0026) [2024-03-29 17:48:53,839][00126] Fps is (10 sec: 37683.9, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 849149952. Throughput: 0: 41720.5. Samples: 731363840. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 17:48:53,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 17:48:56,376][00497] Updated weights for policy 0, policy_version 51837 (0.0024) [2024-03-29 17:48:58,839][00126] Fps is (10 sec: 47514.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 849412096. Throughput: 0: 42044.9. Samples: 731596920. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 17:48:58,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:48:59,947][00497] Updated weights for policy 0, policy_version 51847 (0.0022) [2024-03-29 17:49:03,839][00126] Fps is (10 sec: 42597.4, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 849575936. Throughput: 0: 41705.8. Samples: 731728540. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 17:49:03,841][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 17:49:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000051855_849592320.pth... [2024-03-29 17:49:04,232][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000051244_839581696.pth [2024-03-29 17:49:04,811][00497] Updated weights for policy 0, policy_version 51857 (0.0023) [2024-03-29 17:49:08,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 849772544. Throughput: 0: 41877.4. Samples: 731997400. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 17:49:08,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 17:49:09,104][00497] Updated weights for policy 0, policy_version 51867 (0.0027) [2024-03-29 17:49:12,075][00497] Updated weights for policy 0, policy_version 51877 (0.0028) [2024-03-29 17:49:13,839][00126] Fps is (10 sec: 44237.7, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 850018304. Throughput: 0: 42084.5. Samples: 732225140. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 17:49:13,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 17:49:15,550][00497] Updated weights for policy 0, policy_version 51887 (0.0031) [2024-03-29 17:49:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 850198528. Throughput: 0: 41554.4. Samples: 732347000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 17:49:18,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:49:20,727][00497] Updated weights for policy 0, policy_version 51897 (0.0024) [2024-03-29 17:49:21,571][00476] Signal inference workers to stop experience collection... (26050 times) [2024-03-29 17:49:21,603][00497] InferenceWorker_p0-w0: stopping experience collection (26050 times) [2024-03-29 17:49:21,756][00476] Signal inference workers to resume experience collection... (26050 times) [2024-03-29 17:49:21,757][00497] InferenceWorker_p0-w0: resuming experience collection (26050 times) [2024-03-29 17:49:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 850411520. Throughput: 0: 41521.4. Samples: 732610560. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:23,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:49:24,912][00497] Updated weights for policy 0, policy_version 51907 (0.0020) [2024-03-29 17:49:27,915][00497] Updated weights for policy 0, policy_version 51917 (0.0027) [2024-03-29 17:49:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 850640896. Throughput: 0: 41744.6. Samples: 732841120. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:28,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 17:49:31,460][00497] Updated weights for policy 0, policy_version 51927 (0.0018) [2024-03-29 17:49:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 850821120. Throughput: 0: 41557.0. Samples: 732963940. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:33,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:49:36,260][00497] Updated weights for policy 0, policy_version 51937 (0.0018) [2024-03-29 17:49:38,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 851034112. Throughput: 0: 41799.1. Samples: 733244800. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:38,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:49:40,590][00497] Updated weights for policy 0, policy_version 51947 (0.0023) [2024-03-29 17:49:43,668][00497] Updated weights for policy 0, policy_version 51957 (0.0028) [2024-03-29 17:49:43,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41506.3, 300 sec: 41765.3). Total num frames: 851263488. Throughput: 0: 41746.3. Samples: 733475500. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:43,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 17:49:47,235][00497] Updated weights for policy 0, policy_version 51967 (0.0027) [2024-03-29 17:49:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 851460096. Throughput: 0: 41341.1. Samples: 733588880. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:48,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 17:49:51,997][00497] Updated weights for policy 0, policy_version 51977 (0.0019) [2024-03-29 17:49:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 851656704. Throughput: 0: 41468.0. Samples: 733863460. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 17:49:53,840][00126] Avg episode reward: [(0, '0.633')] [2024-03-29 17:49:56,320][00497] Updated weights for policy 0, policy_version 51987 (0.0018) [2024-03-29 17:49:57,783][00476] Signal inference workers to stop experience collection... (26100 times) [2024-03-29 17:49:57,823][00497] InferenceWorker_p0-w0: stopping experience collection (26100 times) [2024-03-29 17:49:58,013][00476] Signal inference workers to resume experience collection... (26100 times) [2024-03-29 17:49:58,014][00497] InferenceWorker_p0-w0: resuming experience collection (26100 times) [2024-03-29 17:49:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 851886080. Throughput: 0: 41841.8. Samples: 734108020. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 17:49:58,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 17:49:59,609][00497] Updated weights for policy 0, policy_version 51997 (0.0026) [2024-03-29 17:50:02,769][00497] Updated weights for policy 0, policy_version 52007 (0.0031) [2024-03-29 17:50:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 852115456. Throughput: 0: 41673.7. Samples: 734222320. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 17:50:03,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 17:50:07,738][00497] Updated weights for policy 0, policy_version 52017 (0.0017) [2024-03-29 17:50:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 852279296. Throughput: 0: 41816.0. Samples: 734492280. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 17:50:08,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 17:50:12,146][00497] Updated weights for policy 0, policy_version 52027 (0.0021) [2024-03-29 17:50:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.0, 300 sec: 41654.3). Total num frames: 852492288. Throughput: 0: 42493.8. Samples: 734753340. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 17:50:13,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 17:50:15,103][00497] Updated weights for policy 0, policy_version 52037 (0.0028) [2024-03-29 17:50:18,476][00497] Updated weights for policy 0, policy_version 52047 (0.0024) [2024-03-29 17:50:18,839][00126] Fps is (10 sec: 45874.2, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 852738048. Throughput: 0: 41876.8. Samples: 734848400. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 17:50:18,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:50:23,130][00497] Updated weights for policy 0, policy_version 52057 (0.0025) [2024-03-29 17:50:23,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 852918272. Throughput: 0: 41759.4. Samples: 735123980. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 17:50:23,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:50:27,549][00497] Updated weights for policy 0, policy_version 52067 (0.0027) [2024-03-29 17:50:28,839][00126] Fps is (10 sec: 39322.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 853131264. Throughput: 0: 42548.4. Samples: 735390180. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:28,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 17:50:30,326][00476] Signal inference workers to stop experience collection... (26150 times) [2024-03-29 17:50:30,369][00497] InferenceWorker_p0-w0: stopping experience collection (26150 times) [2024-03-29 17:50:30,485][00476] Signal inference workers to resume experience collection... (26150 times) [2024-03-29 17:50:30,485][00497] InferenceWorker_p0-w0: resuming experience collection (26150 times) [2024-03-29 17:50:30,488][00497] Updated weights for policy 0, policy_version 52077 (0.0023) [2024-03-29 17:50:33,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 853377024. Throughput: 0: 42040.8. Samples: 735480720. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:33,841][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:50:34,265][00497] Updated weights for policy 0, policy_version 52087 (0.0023) [2024-03-29 17:50:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 853540864. Throughput: 0: 41705.3. Samples: 735740200. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:38,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:50:38,992][00497] Updated weights for policy 0, policy_version 52097 (0.0020) [2024-03-29 17:50:43,555][00497] Updated weights for policy 0, policy_version 52107 (0.0030) [2024-03-29 17:50:43,839][00126] Fps is (10 sec: 34406.5, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 853721088. Throughput: 0: 42049.8. Samples: 736000260. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:43,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 17:50:46,635][00497] Updated weights for policy 0, policy_version 52117 (0.0021) [2024-03-29 17:50:48,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 853983232. Throughput: 0: 41735.2. Samples: 736100400. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:48,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 17:50:49,887][00497] Updated weights for policy 0, policy_version 52127 (0.0032) [2024-03-29 17:50:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 854147072. Throughput: 0: 41469.3. Samples: 736358400. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:53,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 17:50:54,809][00497] Updated weights for policy 0, policy_version 52137 (0.0019) [2024-03-29 17:50:58,839][00126] Fps is (10 sec: 37682.4, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 854360064. Throughput: 0: 41499.5. Samples: 736620820. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 17:50:58,841][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 17:50:59,166][00497] Updated weights for policy 0, policy_version 52147 (0.0022) [2024-03-29 17:51:02,428][00497] Updated weights for policy 0, policy_version 52157 (0.0029) [2024-03-29 17:51:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 854589440. Throughput: 0: 42111.3. Samples: 736743400. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 17:51:03,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 17:51:04,178][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000052162_854622208.pth... [2024-03-29 17:51:04,515][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000051548_844562432.pth [2024-03-29 17:51:05,856][00497] Updated weights for policy 0, policy_version 52167 (0.0024) [2024-03-29 17:51:05,867][00476] Signal inference workers to stop experience collection... (26200 times) [2024-03-29 17:51:05,868][00476] Signal inference workers to resume experience collection... (26200 times) [2024-03-29 17:51:05,908][00497] InferenceWorker_p0-w0: stopping experience collection (26200 times) [2024-03-29 17:51:05,908][00497] InferenceWorker_p0-w0: resuming experience collection (26200 times) [2024-03-29 17:51:08,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 854786048. Throughput: 0: 41049.1. Samples: 736971180. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 17:51:08,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 17:51:10,888][00497] Updated weights for policy 0, policy_version 52177 (0.0019) [2024-03-29 17:51:13,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 854982656. Throughput: 0: 40719.4. Samples: 737222560. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 17:51:13,840][00126] Avg episode reward: [(0, '0.649')] [2024-03-29 17:51:15,342][00497] Updated weights for policy 0, policy_version 52187 (0.0019) [2024-03-29 17:51:18,363][00497] Updated weights for policy 0, policy_version 52197 (0.0025) [2024-03-29 17:51:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.2, 300 sec: 41709.8). Total num frames: 855212032. Throughput: 0: 41712.9. Samples: 737357800. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 17:51:18,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 17:51:21,604][00497] Updated weights for policy 0, policy_version 52207 (0.0027) [2024-03-29 17:51:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 855408640. Throughput: 0: 40941.8. Samples: 737582580. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 17:51:23,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 17:51:26,621][00497] Updated weights for policy 0, policy_version 52217 (0.0022) [2024-03-29 17:51:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 855605248. Throughput: 0: 41176.9. Samples: 737853220. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 17:51:28,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 17:51:30,938][00497] Updated weights for policy 0, policy_version 52227 (0.0024) [2024-03-29 17:51:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 40959.9, 300 sec: 41654.2). Total num frames: 855834624. Throughput: 0: 41796.7. Samples: 737981260. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:51:33,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 17:51:34,086][00497] Updated weights for policy 0, policy_version 52237 (0.0024) [2024-03-29 17:51:37,334][00497] Updated weights for policy 0, policy_version 52247 (0.0023) [2024-03-29 17:51:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41765.6). Total num frames: 856047616. Throughput: 0: 40844.9. Samples: 738196420. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:51:38,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 17:51:42,587][00497] Updated weights for policy 0, policy_version 52257 (0.0032) [2024-03-29 17:51:43,180][00476] Signal inference workers to stop experience collection... (26250 times) [2024-03-29 17:51:43,211][00497] InferenceWorker_p0-w0: stopping experience collection (26250 times) [2024-03-29 17:51:43,392][00476] Signal inference workers to resume experience collection... (26250 times) [2024-03-29 17:51:43,393][00497] InferenceWorker_p0-w0: resuming experience collection (26250 times) [2024-03-29 17:51:43,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 856227840. Throughput: 0: 41016.9. Samples: 738466580. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:51:43,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 17:51:46,680][00497] Updated weights for policy 0, policy_version 52267 (0.0023) [2024-03-29 17:51:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40959.9, 300 sec: 41543.2). Total num frames: 856440832. Throughput: 0: 41233.8. Samples: 738598920. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:51:48,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 17:51:49,825][00497] Updated weights for policy 0, policy_version 52277 (0.0027) [2024-03-29 17:51:53,150][00497] Updated weights for policy 0, policy_version 52287 (0.0024) [2024-03-29 17:51:53,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 856686592. Throughput: 0: 41459.5. Samples: 738836860. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:51:53,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 17:51:58,310][00497] Updated weights for policy 0, policy_version 52297 (0.0029) [2024-03-29 17:51:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 856850432. Throughput: 0: 41614.8. Samples: 739095220. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:51:58,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 17:52:02,599][00497] Updated weights for policy 0, policy_version 52307 (0.0021) [2024-03-29 17:52:03,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 857047040. Throughput: 0: 41332.4. Samples: 739217760. Policy #0 lag: (min: 1.0, avg: 20.6, max: 43.0) [2024-03-29 17:52:03,840][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 17:52:05,756][00497] Updated weights for policy 0, policy_version 52317 (0.0028) [2024-03-29 17:52:08,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 857309184. Throughput: 0: 41590.7. Samples: 739454160. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:08,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:52:08,960][00497] Updated weights for policy 0, policy_version 52327 (0.0035) [2024-03-29 17:52:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 857473024. Throughput: 0: 41492.4. Samples: 739720380. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:13,841][00126] Avg episode reward: [(0, '0.496')] [2024-03-29 17:52:14,122][00497] Updated weights for policy 0, policy_version 52337 (0.0023) [2024-03-29 17:52:17,058][00476] Signal inference workers to stop experience collection... (26300 times) [2024-03-29 17:52:17,079][00497] InferenceWorker_p0-w0: stopping experience collection (26300 times) [2024-03-29 17:52:17,246][00476] Signal inference workers to resume experience collection... (26300 times) [2024-03-29 17:52:17,247][00497] InferenceWorker_p0-w0: resuming experience collection (26300 times) [2024-03-29 17:52:18,391][00497] Updated weights for policy 0, policy_version 52347 (0.0028) [2024-03-29 17:52:18,839][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 857669632. Throughput: 0: 41147.3. Samples: 739832880. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:18,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 17:52:21,588][00497] Updated weights for policy 0, policy_version 52357 (0.0026) [2024-03-29 17:52:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 857915392. Throughput: 0: 41866.2. Samples: 740080400. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:23,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:52:24,850][00497] Updated weights for policy 0, policy_version 52367 (0.0030) [2024-03-29 17:52:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41654.3). Total num frames: 858095616. Throughput: 0: 41650.3. Samples: 740340840. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:28,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 17:52:29,857][00497] Updated weights for policy 0, policy_version 52377 (0.0027) [2024-03-29 17:52:33,836][00497] Updated weights for policy 0, policy_version 52387 (0.0018) [2024-03-29 17:52:33,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 858308608. Throughput: 0: 41486.1. Samples: 740465800. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:33,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 17:52:37,034][00497] Updated weights for policy 0, policy_version 52397 (0.0019) [2024-03-29 17:52:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 858554368. Throughput: 0: 41785.7. Samples: 740717220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 17:52:38,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 17:52:40,365][00497] Updated weights for policy 0, policy_version 52407 (0.0020) [2024-03-29 17:52:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 858718208. Throughput: 0: 41692.4. Samples: 740971380. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 17:52:43,841][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 17:52:45,311][00497] Updated weights for policy 0, policy_version 52417 (0.0018) [2024-03-29 17:52:48,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 858931200. Throughput: 0: 41761.3. Samples: 741097020. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 17:52:48,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:52:49,417][00497] Updated weights for policy 0, policy_version 52427 (0.0019) [2024-03-29 17:52:50,679][00476] Signal inference workers to stop experience collection... (26350 times) [2024-03-29 17:52:50,716][00497] InferenceWorker_p0-w0: stopping experience collection (26350 times) [2024-03-29 17:52:50,904][00476] Signal inference workers to resume experience collection... (26350 times) [2024-03-29 17:52:50,904][00497] InferenceWorker_p0-w0: resuming experience collection (26350 times) [2024-03-29 17:52:52,700][00497] Updated weights for policy 0, policy_version 52437 (0.0024) [2024-03-29 17:52:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 859160576. Throughput: 0: 41999.5. Samples: 741344140. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 17:52:53,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 17:52:55,928][00497] Updated weights for policy 0, policy_version 52447 (0.0023) [2024-03-29 17:52:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 859357184. Throughput: 0: 41708.4. Samples: 741597260. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 17:52:58,840][00126] Avg episode reward: [(0, '0.534')] [2024-03-29 17:53:00,904][00497] Updated weights for policy 0, policy_version 52457 (0.0017) [2024-03-29 17:53:03,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 859553792. Throughput: 0: 42111.0. Samples: 741727880. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 17:53:03,841][00126] Avg episode reward: [(0, '0.643')] [2024-03-29 17:53:04,205][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000052465_859586560.pth... [2024-03-29 17:53:04,665][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000051855_849592320.pth [2024-03-29 17:53:05,247][00497] Updated weights for policy 0, policy_version 52467 (0.0030) [2024-03-29 17:53:08,238][00497] Updated weights for policy 0, policy_version 52477 (0.0033) [2024-03-29 17:53:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 859799552. Throughput: 0: 42234.1. Samples: 741980940. Policy #0 lag: (min: 0.0, avg: 22.3, max: 40.0) [2024-03-29 17:53:08,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:53:11,728][00497] Updated weights for policy 0, policy_version 52487 (0.0029) [2024-03-29 17:53:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 859979776. Throughput: 0: 41540.3. Samples: 742210160. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:13,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 17:53:16,795][00497] Updated weights for policy 0, policy_version 52497 (0.0030) [2024-03-29 17:53:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 860192768. Throughput: 0: 42009.9. Samples: 742356240. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:18,841][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 17:53:20,784][00497] Updated weights for policy 0, policy_version 52507 (0.0021) [2024-03-29 17:53:22,857][00476] Signal inference workers to stop experience collection... (26400 times) [2024-03-29 17:53:22,858][00476] Signal inference workers to resume experience collection... (26400 times) [2024-03-29 17:53:22,895][00497] InferenceWorker_p0-w0: stopping experience collection (26400 times) [2024-03-29 17:53:22,896][00497] InferenceWorker_p0-w0: resuming experience collection (26400 times) [2024-03-29 17:53:23,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 860405760. Throughput: 0: 41934.7. Samples: 742604280. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:23,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 17:53:24,447][00497] Updated weights for policy 0, policy_version 52517 (0.0033) [2024-03-29 17:53:27,731][00497] Updated weights for policy 0, policy_version 52527 (0.0021) [2024-03-29 17:53:28,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 860635136. Throughput: 0: 41122.7. Samples: 742821900. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:28,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 17:53:32,721][00497] Updated weights for policy 0, policy_version 52537 (0.0018) [2024-03-29 17:53:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 860815360. Throughput: 0: 41657.7. Samples: 742971620. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:33,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 17:53:36,639][00497] Updated weights for policy 0, policy_version 52547 (0.0023) [2024-03-29 17:53:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 861028352. Throughput: 0: 41697.4. Samples: 743220520. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:38,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:53:40,112][00497] Updated weights for policy 0, policy_version 52557 (0.0019) [2024-03-29 17:53:43,333][00497] Updated weights for policy 0, policy_version 52567 (0.0029) [2024-03-29 17:53:43,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 41820.9). Total num frames: 861274112. Throughput: 0: 41149.2. Samples: 743448980. Policy #0 lag: (min: 0.0, avg: 24.4, max: 41.0) [2024-03-29 17:53:43,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:53:48,402][00497] Updated weights for policy 0, policy_version 52577 (0.0021) [2024-03-29 17:53:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 861437952. Throughput: 0: 41564.1. Samples: 743598260. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 17:53:48,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 17:53:52,182][00497] Updated weights for policy 0, policy_version 52587 (0.0026) [2024-03-29 17:53:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 861650944. Throughput: 0: 41713.3. Samples: 743858040. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 17:53:53,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:53:55,516][00497] Updated weights for policy 0, policy_version 52597 (0.0025) [2024-03-29 17:53:56,322][00476] Signal inference workers to stop experience collection... (26450 times) [2024-03-29 17:53:56,376][00497] InferenceWorker_p0-w0: stopping experience collection (26450 times) [2024-03-29 17:53:56,413][00476] Signal inference workers to resume experience collection... (26450 times) [2024-03-29 17:53:56,415][00497] InferenceWorker_p0-w0: resuming experience collection (26450 times) [2024-03-29 17:53:58,712][00497] Updated weights for policy 0, policy_version 52607 (0.0023) [2024-03-29 17:53:58,839][00126] Fps is (10 sec: 47512.8, 60 sec: 42598.3, 300 sec: 41820.9). Total num frames: 861913088. Throughput: 0: 41688.0. Samples: 744086120. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 17:53:58,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 17:54:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 862060544. Throughput: 0: 41629.3. Samples: 744229560. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 17:54:03,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 17:54:03,883][00497] Updated weights for policy 0, policy_version 52617 (0.0021) [2024-03-29 17:54:07,861][00497] Updated weights for policy 0, policy_version 52627 (0.0018) [2024-03-29 17:54:08,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41543.1). Total num frames: 862273536. Throughput: 0: 41992.8. Samples: 744493960. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 17:54:08,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 17:54:11,252][00497] Updated weights for policy 0, policy_version 52637 (0.0025) [2024-03-29 17:54:13,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 862519296. Throughput: 0: 42102.9. Samples: 744716540. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 17:54:13,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 17:54:14,632][00497] Updated weights for policy 0, policy_version 52647 (0.0028) [2024-03-29 17:54:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 862683136. Throughput: 0: 41503.9. Samples: 744839300. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:18,841][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 17:54:19,816][00497] Updated weights for policy 0, policy_version 52657 (0.0029) [2024-03-29 17:54:23,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 862896128. Throughput: 0: 42020.8. Samples: 745111460. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:23,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 17:54:24,012][00497] Updated weights for policy 0, policy_version 52668 (0.0029) [2024-03-29 17:54:27,213][00497] Updated weights for policy 0, policy_version 52678 (0.0023) [2024-03-29 17:54:28,677][00476] Signal inference workers to stop experience collection... (26500 times) [2024-03-29 17:54:28,716][00497] InferenceWorker_p0-w0: stopping experience collection (26500 times) [2024-03-29 17:54:28,839][00126] Fps is (10 sec: 45875.6, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 863141888. Throughput: 0: 42266.3. Samples: 745350960. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:28,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 17:54:28,891][00476] Signal inference workers to resume experience collection... (26500 times) [2024-03-29 17:54:28,891][00497] InferenceWorker_p0-w0: resuming experience collection (26500 times) [2024-03-29 17:54:30,629][00497] Updated weights for policy 0, policy_version 52688 (0.0034) [2024-03-29 17:54:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 863305728. Throughput: 0: 41514.6. Samples: 745466420. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:33,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 17:54:35,696][00497] Updated weights for policy 0, policy_version 52698 (0.0024) [2024-03-29 17:54:38,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 863535104. Throughput: 0: 41766.0. Samples: 745737500. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:38,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 17:54:39,635][00497] Updated weights for policy 0, policy_version 52708 (0.0018) [2024-03-29 17:54:42,979][00497] Updated weights for policy 0, policy_version 52718 (0.0019) [2024-03-29 17:54:43,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 863764480. Throughput: 0: 42027.7. Samples: 745977360. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:43,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 17:54:46,290][00497] Updated weights for policy 0, policy_version 52728 (0.0025) [2024-03-29 17:54:48,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 863944704. Throughput: 0: 41318.7. Samples: 746088900. Policy #0 lag: (min: 1.0, avg: 19.0, max: 41.0) [2024-03-29 17:54:48,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 17:54:51,503][00497] Updated weights for policy 0, policy_version 52738 (0.0023) [2024-03-29 17:54:53,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 864141312. Throughput: 0: 41577.0. Samples: 746364920. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:54:53,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 17:54:55,416][00497] Updated weights for policy 0, policy_version 52748 (0.0025) [2024-03-29 17:54:58,727][00497] Updated weights for policy 0, policy_version 52758 (0.0020) [2024-03-29 17:54:58,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 864387072. Throughput: 0: 41874.7. Samples: 746600900. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:54:58,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:55:01,977][00497] Updated weights for policy 0, policy_version 52768 (0.0021) [2024-03-29 17:55:03,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 864583680. Throughput: 0: 41622.7. Samples: 746712320. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:55:03,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 17:55:03,866][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000052770_864583680.pth... [2024-03-29 17:55:04,223][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000052162_854622208.pth [2024-03-29 17:55:04,913][00476] Signal inference workers to stop experience collection... (26550 times) [2024-03-29 17:55:04,913][00476] Signal inference workers to resume experience collection... (26550 times) [2024-03-29 17:55:04,948][00497] InferenceWorker_p0-w0: stopping experience collection (26550 times) [2024-03-29 17:55:04,948][00497] InferenceWorker_p0-w0: resuming experience collection (26550 times) [2024-03-29 17:55:07,287][00497] Updated weights for policy 0, policy_version 52778 (0.0019) [2024-03-29 17:55:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 864780288. Throughput: 0: 41828.4. Samples: 746993740. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:55:08,841][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 17:55:11,189][00497] Updated weights for policy 0, policy_version 52788 (0.0028) [2024-03-29 17:55:13,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 864993280. Throughput: 0: 41519.1. Samples: 747219320. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:55:13,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 17:55:14,602][00497] Updated weights for policy 0, policy_version 52798 (0.0023) [2024-03-29 17:55:18,111][00497] Updated weights for policy 0, policy_version 52808 (0.0023) [2024-03-29 17:55:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 865206272. Throughput: 0: 41645.7. Samples: 747340480. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:55:18,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 17:55:23,202][00497] Updated weights for policy 0, policy_version 52818 (0.0018) [2024-03-29 17:55:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 865386496. Throughput: 0: 41497.7. Samples: 747604900. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 17:55:23,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 17:55:27,018][00497] Updated weights for policy 0, policy_version 52828 (0.0018) [2024-03-29 17:55:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 865615872. Throughput: 0: 41644.3. Samples: 747851360. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:55:28,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 17:55:30,431][00497] Updated weights for policy 0, policy_version 52838 (0.0023) [2024-03-29 17:55:33,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 865845248. Throughput: 0: 41721.3. Samples: 747966360. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:55:33,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 17:55:33,964][00497] Updated weights for policy 0, policy_version 52848 (0.0025) [2024-03-29 17:55:36,715][00476] Signal inference workers to stop experience collection... (26600 times) [2024-03-29 17:55:36,718][00476] Signal inference workers to resume experience collection... (26600 times) [2024-03-29 17:55:36,758][00497] InferenceWorker_p0-w0: stopping experience collection (26600 times) [2024-03-29 17:55:36,758][00497] InferenceWorker_p0-w0: resuming experience collection (26600 times) [2024-03-29 17:55:38,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 866009088. Throughput: 0: 41487.1. Samples: 748231840. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:55:38,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 17:55:38,886][00497] Updated weights for policy 0, policy_version 52858 (0.0022) [2024-03-29 17:55:42,630][00497] Updated weights for policy 0, policy_version 52868 (0.0032) [2024-03-29 17:55:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 866222080. Throughput: 0: 41497.0. Samples: 748468260. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:55:43,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 17:55:46,402][00497] Updated weights for policy 0, policy_version 52878 (0.0023) [2024-03-29 17:55:48,839][00126] Fps is (10 sec: 47514.0, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 866484224. Throughput: 0: 41485.0. Samples: 748579140. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:55:48,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 17:55:49,969][00497] Updated weights for policy 0, policy_version 52888 (0.0025) [2024-03-29 17:55:53,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 866615296. Throughput: 0: 41146.8. Samples: 748845340. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 17:55:53,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 17:55:54,874][00497] Updated weights for policy 0, policy_version 52898 (0.0023) [2024-03-29 17:55:58,615][00497] Updated weights for policy 0, policy_version 52908 (0.0026) [2024-03-29 17:55:58,839][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 866844672. Throughput: 0: 41675.6. Samples: 749094720. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:55:58,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 17:56:02,184][00497] Updated weights for policy 0, policy_version 52918 (0.0025) [2024-03-29 17:56:03,839][00126] Fps is (10 sec: 47513.6, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 867090432. Throughput: 0: 41564.6. Samples: 749210880. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:56:03,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 17:56:04,360][00476] Signal inference workers to stop experience collection... (26650 times) [2024-03-29 17:56:04,439][00497] InferenceWorker_p0-w0: stopping experience collection (26650 times) [2024-03-29 17:56:04,532][00476] Signal inference workers to resume experience collection... (26650 times) [2024-03-29 17:56:04,533][00497] InferenceWorker_p0-w0: resuming experience collection (26650 times) [2024-03-29 17:56:05,410][00497] Updated weights for policy 0, policy_version 52928 (0.0024) [2024-03-29 17:56:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 867270656. Throughput: 0: 41454.7. Samples: 749470360. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:56:08,841][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 17:56:10,280][00497] Updated weights for policy 0, policy_version 52938 (0.0024) [2024-03-29 17:56:13,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 867483648. Throughput: 0: 41924.5. Samples: 749737960. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:56:13,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:56:13,986][00497] Updated weights for policy 0, policy_version 52948 (0.0025) [2024-03-29 17:56:17,284][00497] Updated weights for policy 0, policy_version 52958 (0.0023) [2024-03-29 17:56:18,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 867729408. Throughput: 0: 42109.9. Samples: 749861300. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:56:18,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 17:56:20,803][00497] Updated weights for policy 0, policy_version 52968 (0.0026) [2024-03-29 17:56:23,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 867893248. Throughput: 0: 41647.7. Samples: 750105980. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:56:23,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 17:56:25,754][00497] Updated weights for policy 0, policy_version 52978 (0.0022) [2024-03-29 17:56:28,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 868122624. Throughput: 0: 42414.3. Samples: 750376900. Policy #0 lag: (min: 0.0, avg: 20.3, max: 42.0) [2024-03-29 17:56:28,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 17:56:29,391][00497] Updated weights for policy 0, policy_version 52988 (0.0019) [2024-03-29 17:56:32,851][00497] Updated weights for policy 0, policy_version 52998 (0.0023) [2024-03-29 17:56:33,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 868368384. Throughput: 0: 42628.9. Samples: 750497440. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 17:56:33,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 17:56:36,413][00497] Updated weights for policy 0, policy_version 53008 (0.0022) [2024-03-29 17:56:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 868532224. Throughput: 0: 42036.0. Samples: 750736960. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 17:56:38,841][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 17:56:40,054][00476] Signal inference workers to stop experience collection... (26700 times) [2024-03-29 17:56:40,054][00476] Signal inference workers to resume experience collection... (26700 times) [2024-03-29 17:56:40,079][00497] InferenceWorker_p0-w0: stopping experience collection (26700 times) [2024-03-29 17:56:40,102][00497] InferenceWorker_p0-w0: resuming experience collection (26700 times) [2024-03-29 17:56:41,269][00497] Updated weights for policy 0, policy_version 53018 (0.0026) [2024-03-29 17:56:43,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 868745216. Throughput: 0: 42620.4. Samples: 751012640. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 17:56:43,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 17:56:44,961][00497] Updated weights for policy 0, policy_version 53028 (0.0020) [2024-03-29 17:56:48,509][00497] Updated weights for policy 0, policy_version 53038 (0.0020) [2024-03-29 17:56:48,839][00126] Fps is (10 sec: 45874.9, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 868990976. Throughput: 0: 42662.6. Samples: 751130700. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 17:56:48,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 17:56:51,933][00497] Updated weights for policy 0, policy_version 53048 (0.0023) [2024-03-29 17:56:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 869171200. Throughput: 0: 42081.8. Samples: 751364040. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 17:56:53,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 17:56:57,105][00497] Updated weights for policy 0, policy_version 53058 (0.0026) [2024-03-29 17:56:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 869367808. Throughput: 0: 42037.8. Samples: 751629660. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 17:56:58,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 17:57:00,727][00497] Updated weights for policy 0, policy_version 53068 (0.0024) [2024-03-29 17:57:03,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 869613568. Throughput: 0: 42162.9. Samples: 751758640. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:03,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 17:57:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053077_869613568.pth... [2024-03-29 17:57:04,166][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000052465_859586560.pth [2024-03-29 17:57:04,432][00497] Updated weights for policy 0, policy_version 53078 (0.0024) [2024-03-29 17:57:07,804][00497] Updated weights for policy 0, policy_version 53088 (0.0027) [2024-03-29 17:57:08,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 869810176. Throughput: 0: 41479.9. Samples: 751972580. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:08,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:57:12,834][00476] Signal inference workers to stop experience collection... (26750 times) [2024-03-29 17:57:12,877][00497] InferenceWorker_p0-w0: stopping experience collection (26750 times) [2024-03-29 17:57:12,994][00476] Signal inference workers to resume experience collection... (26750 times) [2024-03-29 17:57:12,995][00497] InferenceWorker_p0-w0: resuming experience collection (26750 times) [2024-03-29 17:57:13,000][00497] Updated weights for policy 0, policy_version 53098 (0.0024) [2024-03-29 17:57:13,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 870006784. Throughput: 0: 41824.0. Samples: 752258980. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:13,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 17:57:16,734][00497] Updated weights for policy 0, policy_version 53108 (0.0022) [2024-03-29 17:57:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 870219776. Throughput: 0: 41701.6. Samples: 752374020. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:18,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 17:57:20,204][00497] Updated weights for policy 0, policy_version 53118 (0.0019) [2024-03-29 17:57:23,494][00497] Updated weights for policy 0, policy_version 53128 (0.0027) [2024-03-29 17:57:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.3, 300 sec: 41876.4). Total num frames: 870449152. Throughput: 0: 41285.3. Samples: 752594800. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:23,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 17:57:28,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 870596608. Throughput: 0: 41332.1. Samples: 752872580. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:28,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 17:57:28,918][00497] Updated weights for policy 0, policy_version 53138 (0.0033) [2024-03-29 17:57:32,482][00497] Updated weights for policy 0, policy_version 53148 (0.0024) [2024-03-29 17:57:33,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 870825984. Throughput: 0: 41377.4. Samples: 752992680. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 17:57:33,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 17:57:36,130][00497] Updated weights for policy 0, policy_version 53158 (0.0019) [2024-03-29 17:57:38,839][00126] Fps is (10 sec: 47512.8, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 871071744. Throughput: 0: 41093.7. Samples: 753213260. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:57:38,840][00126] Avg episode reward: [(0, '0.633')] [2024-03-29 17:57:39,598][00497] Updated weights for policy 0, policy_version 53168 (0.0023) [2024-03-29 17:57:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 871235584. Throughput: 0: 41250.4. Samples: 753485920. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:57:43,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 17:57:45,130][00497] Updated weights for policy 0, policy_version 53178 (0.0020) [2024-03-29 17:57:45,387][00476] Signal inference workers to stop experience collection... (26800 times) [2024-03-29 17:57:45,408][00497] InferenceWorker_p0-w0: stopping experience collection (26800 times) [2024-03-29 17:57:45,604][00476] Signal inference workers to resume experience collection... (26800 times) [2024-03-29 17:57:45,604][00497] InferenceWorker_p0-w0: resuming experience collection (26800 times) [2024-03-29 17:57:48,446][00497] Updated weights for policy 0, policy_version 53188 (0.0026) [2024-03-29 17:57:48,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 871448576. Throughput: 0: 40932.5. Samples: 753600600. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:57:48,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 17:57:52,084][00497] Updated weights for policy 0, policy_version 53198 (0.0021) [2024-03-29 17:57:53,839][00126] Fps is (10 sec: 45874.2, 60 sec: 42052.1, 300 sec: 41820.8). Total num frames: 871694336. Throughput: 0: 41821.6. Samples: 753854560. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:57:53,840][00126] Avg episode reward: [(0, '0.635')] [2024-03-29 17:57:55,451][00497] Updated weights for policy 0, policy_version 53208 (0.0024) [2024-03-29 17:57:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.2, 300 sec: 41654.3). Total num frames: 871841792. Throughput: 0: 41112.1. Samples: 754109020. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:57:58,840][00126] Avg episode reward: [(0, '0.580')] [2024-03-29 17:58:00,884][00497] Updated weights for policy 0, policy_version 53218 (0.0028) [2024-03-29 17:58:03,839][00126] Fps is (10 sec: 36045.6, 60 sec: 40687.0, 300 sec: 41543.2). Total num frames: 872054784. Throughput: 0: 41467.7. Samples: 754240060. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:58:03,840][00126] Avg episode reward: [(0, '0.634')] [2024-03-29 17:58:04,343][00497] Updated weights for policy 0, policy_version 53228 (0.0025) [2024-03-29 17:58:08,212][00497] Updated weights for policy 0, policy_version 53238 (0.0029) [2024-03-29 17:58:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 872284160. Throughput: 0: 41441.4. Samples: 754459660. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 17:58:08,840][00126] Avg episode reward: [(0, '0.654')] [2024-03-29 17:58:11,812][00497] Updated weights for policy 0, policy_version 53248 (0.0032) [2024-03-29 17:58:13,839][00126] Fps is (10 sec: 40959.4, 60 sec: 40959.9, 300 sec: 41598.7). Total num frames: 872464384. Throughput: 0: 40596.7. Samples: 754699440. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:13,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 17:58:16,801][00497] Updated weights for policy 0, policy_version 53258 (0.0025) [2024-03-29 17:58:17,951][00476] Signal inference workers to stop experience collection... (26850 times) [2024-03-29 17:58:17,954][00476] Signal inference workers to resume experience collection... (26850 times) [2024-03-29 17:58:17,996][00497] InferenceWorker_p0-w0: stopping experience collection (26850 times) [2024-03-29 17:58:17,996][00497] InferenceWorker_p0-w0: resuming experience collection (26850 times) [2024-03-29 17:58:18,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41543.2). Total num frames: 872660992. Throughput: 0: 41030.7. Samples: 754839060. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:18,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 17:58:20,397][00497] Updated weights for policy 0, policy_version 53268 (0.0026) [2024-03-29 17:58:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 40686.9, 300 sec: 41543.1). Total num frames: 872890368. Throughput: 0: 41328.0. Samples: 755073020. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:23,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 17:58:24,097][00497] Updated weights for policy 0, policy_version 53278 (0.0022) [2024-03-29 17:58:27,517][00497] Updated weights for policy 0, policy_version 53288 (0.0027) [2024-03-29 17:58:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 873103360. Throughput: 0: 40778.2. Samples: 755320940. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:28,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 17:58:32,409][00497] Updated weights for policy 0, policy_version 53298 (0.0023) [2024-03-29 17:58:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 873299968. Throughput: 0: 41375.5. Samples: 755462500. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:33,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 17:58:36,126][00497] Updated weights for policy 0, policy_version 53308 (0.0029) [2024-03-29 17:58:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40687.0, 300 sec: 41487.6). Total num frames: 873512960. Throughput: 0: 41102.4. Samples: 755704160. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:38,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 17:58:39,745][00497] Updated weights for policy 0, policy_version 53318 (0.0021) [2024-03-29 17:58:43,412][00497] Updated weights for policy 0, policy_version 53328 (0.0024) [2024-03-29 17:58:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 873725952. Throughput: 0: 40597.2. Samples: 755935900. Policy #0 lag: (min: 0.0, avg: 24.5, max: 44.0) [2024-03-29 17:58:43,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 17:58:48,144][00497] Updated weights for policy 0, policy_version 53338 (0.0027) [2024-03-29 17:58:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 873906176. Throughput: 0: 41037.7. Samples: 756086760. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 17:58:48,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 17:58:51,903][00497] Updated weights for policy 0, policy_version 53348 (0.0025) [2024-03-29 17:58:52,555][00476] Signal inference workers to stop experience collection... (26900 times) [2024-03-29 17:58:52,585][00497] InferenceWorker_p0-w0: stopping experience collection (26900 times) [2024-03-29 17:58:52,741][00476] Signal inference workers to resume experience collection... (26900 times) [2024-03-29 17:58:52,741][00497] InferenceWorker_p0-w0: resuming experience collection (26900 times) [2024-03-29 17:58:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 874151936. Throughput: 0: 41494.6. Samples: 756326920. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 17:58:53,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 17:58:55,546][00497] Updated weights for policy 0, policy_version 53358 (0.0032) [2024-03-29 17:58:58,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42052.1, 300 sec: 41709.8). Total num frames: 874364928. Throughput: 0: 41425.4. Samples: 756563580. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 17:58:58,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 17:58:58,942][00497] Updated weights for policy 0, policy_version 53368 (0.0019) [2024-03-29 17:59:03,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.0, 300 sec: 41543.2). Total num frames: 874528768. Throughput: 0: 41540.0. Samples: 756708360. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 17:59:03,841][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 17:59:03,878][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053378_874545152.pth... [2024-03-29 17:59:03,891][00497] Updated weights for policy 0, policy_version 53378 (0.0018) [2024-03-29 17:59:04,204][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000052770_864583680.pth [2024-03-29 17:59:07,677][00497] Updated weights for policy 0, policy_version 53388 (0.0026) [2024-03-29 17:59:08,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 874774528. Throughput: 0: 42126.8. Samples: 756968720. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 17:59:08,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 17:59:11,467][00497] Updated weights for policy 0, policy_version 53398 (0.0028) [2024-03-29 17:59:13,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 874987520. Throughput: 0: 41279.0. Samples: 757178500. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 17:59:13,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 17:59:14,801][00497] Updated weights for policy 0, policy_version 53408 (0.0031) [2024-03-29 17:59:18,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 875151360. Throughput: 0: 41196.4. Samples: 757316340. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:18,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 17:59:19,999][00497] Updated weights for policy 0, policy_version 53418 (0.0017) [2024-03-29 17:59:23,640][00497] Updated weights for policy 0, policy_version 53428 (0.0021) [2024-03-29 17:59:23,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 875364352. Throughput: 0: 41782.1. Samples: 757584360. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:23,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 17:59:26,007][00476] Signal inference workers to stop experience collection... (26950 times) [2024-03-29 17:59:26,050][00497] InferenceWorker_p0-w0: stopping experience collection (26950 times) [2024-03-29 17:59:26,221][00476] Signal inference workers to resume experience collection... (26950 times) [2024-03-29 17:59:26,222][00497] InferenceWorker_p0-w0: resuming experience collection (26950 times) [2024-03-29 17:59:27,160][00497] Updated weights for policy 0, policy_version 53438 (0.0027) [2024-03-29 17:59:28,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 875610112. Throughput: 0: 41594.7. Samples: 757807660. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:28,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 17:59:30,612][00497] Updated weights for policy 0, policy_version 53448 (0.0024) [2024-03-29 17:59:33,840][00126] Fps is (10 sec: 40958.8, 60 sec: 41232.8, 300 sec: 41487.6). Total num frames: 875773952. Throughput: 0: 40997.0. Samples: 757931640. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:33,842][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 17:59:35,669][00497] Updated weights for policy 0, policy_version 53458 (0.0019) [2024-03-29 17:59:38,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 875986944. Throughput: 0: 41745.8. Samples: 758205480. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:38,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 17:59:39,279][00497] Updated weights for policy 0, policy_version 53468 (0.0027) [2024-03-29 17:59:42,875][00497] Updated weights for policy 0, policy_version 53478 (0.0032) [2024-03-29 17:59:43,839][00126] Fps is (10 sec: 45876.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 876232704. Throughput: 0: 41646.7. Samples: 758437680. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:43,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 17:59:46,377][00497] Updated weights for policy 0, policy_version 53488 (0.0020) [2024-03-29 17:59:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 876396544. Throughput: 0: 41105.9. Samples: 758558120. Policy #0 lag: (min: 0.0, avg: 18.5, max: 40.0) [2024-03-29 17:59:48,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 17:59:51,275][00497] Updated weights for policy 0, policy_version 53498 (0.0020) [2024-03-29 17:59:53,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 876625920. Throughput: 0: 41356.9. Samples: 758829780. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 17:59:53,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 17:59:54,981][00497] Updated weights for policy 0, policy_version 53508 (0.0019) [2024-03-29 17:59:58,568][00497] Updated weights for policy 0, policy_version 53518 (0.0021) [2024-03-29 17:59:58,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 876855296. Throughput: 0: 41973.5. Samples: 759067300. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 17:59:58,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 18:00:01,922][00497] Updated weights for policy 0, policy_version 53528 (0.0024) [2024-03-29 18:00:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 877051904. Throughput: 0: 41711.6. Samples: 759193360. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 18:00:03,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 18:00:04,101][00476] Signal inference workers to stop experience collection... (27000 times) [2024-03-29 18:00:04,175][00476] Signal inference workers to resume experience collection... (27000 times) [2024-03-29 18:00:04,177][00497] InferenceWorker_p0-w0: stopping experience collection (27000 times) [2024-03-29 18:00:04,200][00497] InferenceWorker_p0-w0: resuming experience collection (27000 times) [2024-03-29 18:00:06,790][00497] Updated weights for policy 0, policy_version 53538 (0.0019) [2024-03-29 18:00:08,839][00126] Fps is (10 sec: 39320.8, 60 sec: 41232.9, 300 sec: 41543.1). Total num frames: 877248512. Throughput: 0: 41781.7. Samples: 759464540. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 18:00:08,842][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:00:10,666][00497] Updated weights for policy 0, policy_version 53548 (0.0023) [2024-03-29 18:00:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 877477888. Throughput: 0: 42036.4. Samples: 759699300. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 18:00:13,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 18:00:14,379][00497] Updated weights for policy 0, policy_version 53558 (0.0032) [2024-03-29 18:00:17,628][00497] Updated weights for policy 0, policy_version 53568 (0.0028) [2024-03-29 18:00:18,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 877690880. Throughput: 0: 41882.1. Samples: 759816320. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 18:00:18,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 18:00:22,492][00497] Updated weights for policy 0, policy_version 53578 (0.0020) [2024-03-29 18:00:23,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 877887488. Throughput: 0: 41895.8. Samples: 760090800. Policy #0 lag: (min: 0.0, avg: 19.1, max: 40.0) [2024-03-29 18:00:23,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:00:26,063][00497] Updated weights for policy 0, policy_version 53588 (0.0032) [2024-03-29 18:00:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 878116864. Throughput: 0: 42219.7. Samples: 760337560. Policy #0 lag: (min: 0.0, avg: 19.0, max: 43.0) [2024-03-29 18:00:28,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 18:00:29,610][00497] Updated weights for policy 0, policy_version 53598 (0.0034) [2024-03-29 18:00:32,974][00497] Updated weights for policy 0, policy_version 53608 (0.0031) [2024-03-29 18:00:33,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42871.7, 300 sec: 41820.9). Total num frames: 878346240. Throughput: 0: 42180.8. Samples: 760456260. Policy #0 lag: (min: 0.0, avg: 19.0, max: 43.0) [2024-03-29 18:00:33,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 18:00:37,967][00497] Updated weights for policy 0, policy_version 53618 (0.0019) [2024-03-29 18:00:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 878510080. Throughput: 0: 42193.4. Samples: 760728480. Policy #0 lag: (min: 0.0, avg: 19.0, max: 43.0) [2024-03-29 18:00:38,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 18:00:40,430][00476] Signal inference workers to stop experience collection... (27050 times) [2024-03-29 18:00:40,471][00497] InferenceWorker_p0-w0: stopping experience collection (27050 times) [2024-03-29 18:00:40,631][00476] Signal inference workers to resume experience collection... (27050 times) [2024-03-29 18:00:40,631][00497] InferenceWorker_p0-w0: resuming experience collection (27050 times) [2024-03-29 18:00:41,716][00497] Updated weights for policy 0, policy_version 53628 (0.0021) [2024-03-29 18:00:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 878739456. Throughput: 0: 41955.5. Samples: 760955300. Policy #0 lag: (min: 0.0, avg: 19.0, max: 43.0) [2024-03-29 18:00:43,840][00126] Avg episode reward: [(0, '0.485')] [2024-03-29 18:00:45,257][00497] Updated weights for policy 0, policy_version 53638 (0.0023) [2024-03-29 18:00:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 878952448. Throughput: 0: 41929.7. Samples: 761080200. Policy #0 lag: (min: 0.0, avg: 19.0, max: 43.0) [2024-03-29 18:00:48,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 18:00:48,888][00497] Updated weights for policy 0, policy_version 53648 (0.0025) [2024-03-29 18:00:53,631][00497] Updated weights for policy 0, policy_version 53658 (0.0017) [2024-03-29 18:00:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 879132672. Throughput: 0: 41785.9. Samples: 761344900. Policy #0 lag: (min: 0.0, avg: 19.0, max: 43.0) [2024-03-29 18:00:53,840][00126] Avg episode reward: [(0, '0.654')] [2024-03-29 18:00:57,278][00497] Updated weights for policy 0, policy_version 53668 (0.0022) [2024-03-29 18:00:58,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.1, 300 sec: 41654.2). Total num frames: 879378432. Throughput: 0: 42091.4. Samples: 761593420. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:00:58,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 18:01:00,865][00497] Updated weights for policy 0, policy_version 53678 (0.0023) [2024-03-29 18:01:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 879591424. Throughput: 0: 42415.5. Samples: 761725020. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:01:03,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 18:01:04,059][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053687_879607808.pth... [2024-03-29 18:01:04,371][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053077_869613568.pth [2024-03-29 18:01:04,641][00497] Updated weights for policy 0, policy_version 53688 (0.0030) [2024-03-29 18:01:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 879771648. Throughput: 0: 41800.0. Samples: 761971800. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:01:08,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 18:01:09,264][00497] Updated weights for policy 0, policy_version 53698 (0.0031) [2024-03-29 18:01:12,759][00497] Updated weights for policy 0, policy_version 53708 (0.0018) [2024-03-29 18:01:13,452][00476] Signal inference workers to stop experience collection... (27100 times) [2024-03-29 18:01:13,514][00497] InferenceWorker_p0-w0: stopping experience collection (27100 times) [2024-03-29 18:01:13,642][00476] Signal inference workers to resume experience collection... (27100 times) [2024-03-29 18:01:13,643][00497] InferenceWorker_p0-w0: resuming experience collection (27100 times) [2024-03-29 18:01:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 880001024. Throughput: 0: 42096.5. Samples: 762231900. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:01:13,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 18:01:16,503][00497] Updated weights for policy 0, policy_version 53718 (0.0020) [2024-03-29 18:01:18,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.2, 300 sec: 41820.8). Total num frames: 880230400. Throughput: 0: 42207.4. Samples: 762355600. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:01:18,841][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:01:19,981][00497] Updated weights for policy 0, policy_version 53728 (0.0018) [2024-03-29 18:01:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 880410624. Throughput: 0: 41933.3. Samples: 762615480. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:01:23,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 18:01:24,600][00497] Updated weights for policy 0, policy_version 53738 (0.0016) [2024-03-29 18:01:28,150][00497] Updated weights for policy 0, policy_version 53748 (0.0018) [2024-03-29 18:01:28,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 880640000. Throughput: 0: 42776.4. Samples: 762880240. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 18:01:28,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:01:31,860][00497] Updated weights for policy 0, policy_version 53758 (0.0023) [2024-03-29 18:01:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 880852992. Throughput: 0: 42514.2. Samples: 762993340. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:01:33,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:01:35,574][00497] Updated weights for policy 0, policy_version 53768 (0.0026) [2024-03-29 18:01:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 881049600. Throughput: 0: 42176.5. Samples: 763242840. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:01:38,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 18:01:40,110][00497] Updated weights for policy 0, policy_version 53778 (0.0018) [2024-03-29 18:01:43,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 881246208. Throughput: 0: 42339.8. Samples: 763498700. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:01:43,841][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 18:01:43,887][00497] Updated weights for policy 0, policy_version 53788 (0.0019) [2024-03-29 18:01:47,545][00497] Updated weights for policy 0, policy_version 53798 (0.0028) [2024-03-29 18:01:48,368][00476] Signal inference workers to stop experience collection... (27150 times) [2024-03-29 18:01:48,413][00497] InferenceWorker_p0-w0: stopping experience collection (27150 times) [2024-03-29 18:01:48,448][00476] Signal inference workers to resume experience collection... (27150 times) [2024-03-29 18:01:48,450][00497] InferenceWorker_p0-w0: resuming experience collection (27150 times) [2024-03-29 18:01:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 881491968. Throughput: 0: 42084.5. Samples: 763618820. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:01:48,840][00126] Avg episode reward: [(0, '0.580')] [2024-03-29 18:01:51,131][00497] Updated weights for policy 0, policy_version 53808 (0.0024) [2024-03-29 18:01:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 881688576. Throughput: 0: 42157.1. Samples: 763868860. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:01:53,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 18:01:55,548][00497] Updated weights for policy 0, policy_version 53818 (0.0020) [2024-03-29 18:01:58,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 881885184. Throughput: 0: 42323.5. Samples: 764136460. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:01:58,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:01:59,403][00497] Updated weights for policy 0, policy_version 53828 (0.0023) [2024-03-29 18:02:02,935][00497] Updated weights for policy 0, policy_version 53838 (0.0019) [2024-03-29 18:02:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 882114560. Throughput: 0: 42062.8. Samples: 764248420. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 18:02:03,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 18:02:06,673][00497] Updated weights for policy 0, policy_version 53848 (0.0020) [2024-03-29 18:02:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.5, 300 sec: 41765.3). Total num frames: 882327552. Throughput: 0: 41903.5. Samples: 764501140. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:08,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 18:02:11,219][00497] Updated weights for policy 0, policy_version 53858 (0.0030) [2024-03-29 18:02:13,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 882507776. Throughput: 0: 42153.7. Samples: 764777160. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:13,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:02:14,772][00497] Updated weights for policy 0, policy_version 53868 (0.0024) [2024-03-29 18:02:18,485][00497] Updated weights for policy 0, policy_version 53878 (0.0020) [2024-03-29 18:02:18,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 882737152. Throughput: 0: 42064.0. Samples: 764886220. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:18,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:02:19,289][00476] Signal inference workers to stop experience collection... (27200 times) [2024-03-29 18:02:19,326][00497] InferenceWorker_p0-w0: stopping experience collection (27200 times) [2024-03-29 18:02:19,513][00476] Signal inference workers to resume experience collection... (27200 times) [2024-03-29 18:02:19,514][00497] InferenceWorker_p0-w0: resuming experience collection (27200 times) [2024-03-29 18:02:22,387][00497] Updated weights for policy 0, policy_version 53888 (0.0023) [2024-03-29 18:02:23,840][00126] Fps is (10 sec: 44235.9, 60 sec: 42325.1, 300 sec: 41876.3). Total num frames: 882950144. Throughput: 0: 41922.8. Samples: 765129380. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:23,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 18:02:26,952][00497] Updated weights for policy 0, policy_version 53898 (0.0022) [2024-03-29 18:02:28,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 883130368. Throughput: 0: 42283.5. Samples: 765401460. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:28,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 18:02:30,327][00497] Updated weights for policy 0, policy_version 53908 (0.0029) [2024-03-29 18:02:33,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 883376128. Throughput: 0: 42332.7. Samples: 765523800. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:33,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 18:02:34,023][00497] Updated weights for policy 0, policy_version 53918 (0.0030) [2024-03-29 18:02:37,788][00497] Updated weights for policy 0, policy_version 53928 (0.0024) [2024-03-29 18:02:38,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 883589120. Throughput: 0: 42199.8. Samples: 765767860. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 18:02:38,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:02:42,179][00497] Updated weights for policy 0, policy_version 53938 (0.0019) [2024-03-29 18:02:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 883785728. Throughput: 0: 42367.1. Samples: 766042980. Policy #0 lag: (min: 2.0, avg: 21.4, max: 42.0) [2024-03-29 18:02:43,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:02:45,741][00497] Updated weights for policy 0, policy_version 53948 (0.0025) [2024-03-29 18:02:48,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.2, 300 sec: 41820.9). Total num frames: 884031488. Throughput: 0: 42519.9. Samples: 766161820. Policy #0 lag: (min: 2.0, avg: 21.4, max: 42.0) [2024-03-29 18:02:48,842][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 18:02:49,503][00497] Updated weights for policy 0, policy_version 53958 (0.0024) [2024-03-29 18:02:52,095][00476] Signal inference workers to stop experience collection... (27250 times) [2024-03-29 18:02:52,141][00497] InferenceWorker_p0-w0: stopping experience collection (27250 times) [2024-03-29 18:02:52,174][00476] Signal inference workers to resume experience collection... (27250 times) [2024-03-29 18:02:52,177][00497] InferenceWorker_p0-w0: resuming experience collection (27250 times) [2024-03-29 18:02:53,067][00497] Updated weights for policy 0, policy_version 53968 (0.0023) [2024-03-29 18:02:53,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 884244480. Throughput: 0: 42387.0. Samples: 766408560. Policy #0 lag: (min: 2.0, avg: 21.4, max: 42.0) [2024-03-29 18:02:53,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:02:57,281][00497] Updated weights for policy 0, policy_version 53978 (0.0017) [2024-03-29 18:02:58,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 884441088. Throughput: 0: 42464.1. Samples: 766688040. Policy #0 lag: (min: 2.0, avg: 21.4, max: 42.0) [2024-03-29 18:02:58,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:03:00,869][00497] Updated weights for policy 0, policy_version 53988 (0.0021) [2024-03-29 18:03:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 884686848. Throughput: 0: 42576.9. Samples: 766802180. Policy #0 lag: (min: 2.0, avg: 21.4, max: 42.0) [2024-03-29 18:03:03,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:03:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053997_884686848.pth... [2024-03-29 18:03:04,161][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053378_874545152.pth [2024-03-29 18:03:04,778][00497] Updated weights for policy 0, policy_version 53998 (0.0021) [2024-03-29 18:03:08,688][00497] Updated weights for policy 0, policy_version 54008 (0.0020) [2024-03-29 18:03:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 884867072. Throughput: 0: 42536.3. Samples: 767043500. Policy #0 lag: (min: 2.0, avg: 21.4, max: 42.0) [2024-03-29 18:03:08,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 18:03:12,943][00497] Updated weights for policy 0, policy_version 54018 (0.0028) [2024-03-29 18:03:13,839][00126] Fps is (10 sec: 36044.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 885047296. Throughput: 0: 42400.9. Samples: 767309500. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:13,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:03:16,695][00497] Updated weights for policy 0, policy_version 54028 (0.0028) [2024-03-29 18:03:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 885276672. Throughput: 0: 42481.4. Samples: 767435460. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:18,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 18:03:20,381][00497] Updated weights for policy 0, policy_version 54038 (0.0024) [2024-03-29 18:03:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.6, 300 sec: 41987.5). Total num frames: 885489664. Throughput: 0: 42114.0. Samples: 767662980. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:23,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:03:24,368][00497] Updated weights for policy 0, policy_version 54048 (0.0024) [2024-03-29 18:03:27,752][00476] Signal inference workers to stop experience collection... (27300 times) [2024-03-29 18:03:27,800][00497] InferenceWorker_p0-w0: stopping experience collection (27300 times) [2024-03-29 18:03:27,931][00476] Signal inference workers to resume experience collection... (27300 times) [2024-03-29 18:03:27,932][00497] InferenceWorker_p0-w0: resuming experience collection (27300 times) [2024-03-29 18:03:28,810][00497] Updated weights for policy 0, policy_version 54058 (0.0019) [2024-03-29 18:03:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 885686272. Throughput: 0: 41844.0. Samples: 767925960. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:28,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 18:03:32,529][00497] Updated weights for policy 0, policy_version 54068 (0.0022) [2024-03-29 18:03:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 885899264. Throughput: 0: 42304.5. Samples: 768065520. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:33,840][00126] Avg episode reward: [(0, '0.645')] [2024-03-29 18:03:36,489][00497] Updated weights for policy 0, policy_version 54078 (0.0020) [2024-03-29 18:03:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 886095872. Throughput: 0: 41821.9. Samples: 768290540. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:38,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:03:40,236][00497] Updated weights for policy 0, policy_version 54088 (0.0024) [2024-03-29 18:03:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 886308864. Throughput: 0: 41239.2. Samples: 768543800. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 18:03:43,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 18:03:44,576][00497] Updated weights for policy 0, policy_version 54098 (0.0019) [2024-03-29 18:03:48,488][00497] Updated weights for policy 0, policy_version 54108 (0.0032) [2024-03-29 18:03:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 886521856. Throughput: 0: 41574.7. Samples: 768673040. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:03:48,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:03:52,518][00497] Updated weights for policy 0, policy_version 54118 (0.0042) [2024-03-29 18:03:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 41932.0). Total num frames: 886734848. Throughput: 0: 41363.1. Samples: 768904840. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:03:53,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 18:03:56,085][00497] Updated weights for policy 0, policy_version 54128 (0.0024) [2024-03-29 18:03:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 886931456. Throughput: 0: 41184.0. Samples: 769162780. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:03:58,840][00126] Avg episode reward: [(0, '0.640')] [2024-03-29 18:04:00,242][00497] Updated weights for policy 0, policy_version 54138 (0.0022) [2024-03-29 18:04:03,088][00476] Signal inference workers to stop experience collection... (27350 times) [2024-03-29 18:04:03,118][00497] InferenceWorker_p0-w0: stopping experience collection (27350 times) [2024-03-29 18:04:03,304][00476] Signal inference workers to resume experience collection... (27350 times) [2024-03-29 18:04:03,305][00497] InferenceWorker_p0-w0: resuming experience collection (27350 times) [2024-03-29 18:04:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 887144448. Throughput: 0: 41469.8. Samples: 769301600. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:04:03,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:04:04,120][00497] Updated weights for policy 0, policy_version 54148 (0.0021) [2024-03-29 18:04:07,743][00497] Updated weights for policy 0, policy_version 54158 (0.0029) [2024-03-29 18:04:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 887373824. Throughput: 0: 41524.4. Samples: 769531580. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:04:08,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 18:04:11,812][00497] Updated weights for policy 0, policy_version 54168 (0.0019) [2024-03-29 18:04:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 887570432. Throughput: 0: 41385.7. Samples: 769788320. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:04:13,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 18:04:15,991][00497] Updated weights for policy 0, policy_version 54178 (0.0021) [2024-03-29 18:04:18,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.0, 300 sec: 41987.5). Total num frames: 887750656. Throughput: 0: 41458.2. Samples: 769931140. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 18:04:18,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:04:19,832][00497] Updated weights for policy 0, policy_version 54188 (0.0028) [2024-03-29 18:04:23,428][00497] Updated weights for policy 0, policy_version 54198 (0.0023) [2024-03-29 18:04:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 887980032. Throughput: 0: 41735.9. Samples: 770168660. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:23,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 18:04:27,573][00497] Updated weights for policy 0, policy_version 54208 (0.0031) [2024-03-29 18:04:28,839][00126] Fps is (10 sec: 44237.5, 60 sec: 41779.3, 300 sec: 42098.6). Total num frames: 888193024. Throughput: 0: 41408.9. Samples: 770407200. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:28,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:04:31,811][00497] Updated weights for policy 0, policy_version 54218 (0.0019) [2024-03-29 18:04:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40959.9, 300 sec: 41931.9). Total num frames: 888356864. Throughput: 0: 41558.6. Samples: 770543180. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:33,841][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:04:35,577][00497] Updated weights for policy 0, policy_version 54228 (0.0020) [2024-03-29 18:04:35,689][00476] Signal inference workers to stop experience collection... (27400 times) [2024-03-29 18:04:35,756][00497] InferenceWorker_p0-w0: stopping experience collection (27400 times) [2024-03-29 18:04:35,850][00476] Signal inference workers to resume experience collection... (27400 times) [2024-03-29 18:04:35,851][00497] InferenceWorker_p0-w0: resuming experience collection (27400 times) [2024-03-29 18:04:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 888619008. Throughput: 0: 41896.0. Samples: 770790160. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:38,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 18:04:39,078][00497] Updated weights for policy 0, policy_version 54238 (0.0020) [2024-03-29 18:04:43,248][00497] Updated weights for policy 0, policy_version 54248 (0.0022) [2024-03-29 18:04:43,839][00126] Fps is (10 sec: 45875.7, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 888815616. Throughput: 0: 41648.0. Samples: 771036940. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:43,840][00126] Avg episode reward: [(0, '0.672')] [2024-03-29 18:04:47,533][00497] Updated weights for policy 0, policy_version 54258 (0.0019) [2024-03-29 18:04:48,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 889012224. Throughput: 0: 41572.0. Samples: 771172340. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:48,840][00126] Avg episode reward: [(0, '0.647')] [2024-03-29 18:04:51,307][00497] Updated weights for policy 0, policy_version 54268 (0.0022) [2024-03-29 18:04:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 889241600. Throughput: 0: 41772.3. Samples: 771411340. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 18:04:53,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 18:04:54,840][00497] Updated weights for policy 0, policy_version 54278 (0.0019) [2024-03-29 18:04:58,784][00497] Updated weights for policy 0, policy_version 54288 (0.0019) [2024-03-29 18:04:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 889454592. Throughput: 0: 41870.3. Samples: 771672480. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:04:58,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:05:03,143][00497] Updated weights for policy 0, policy_version 54298 (0.0020) [2024-03-29 18:05:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 889651200. Throughput: 0: 41611.6. Samples: 771803660. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:05:03,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 18:05:04,117][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000054301_889667584.pth... [2024-03-29 18:05:04,452][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053687_879607808.pth [2024-03-29 18:05:06,891][00497] Updated weights for policy 0, policy_version 54308 (0.0030) [2024-03-29 18:05:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 889864192. Throughput: 0: 41706.8. Samples: 772045460. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:05:08,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 18:05:10,807][00497] Updated weights for policy 0, policy_version 54318 (0.0019) [2024-03-29 18:05:13,408][00476] Signal inference workers to stop experience collection... (27450 times) [2024-03-29 18:05:13,408][00476] Signal inference workers to resume experience collection... (27450 times) [2024-03-29 18:05:13,444][00497] InferenceWorker_p0-w0: stopping experience collection (27450 times) [2024-03-29 18:05:13,444][00497] InferenceWorker_p0-w0: resuming experience collection (27450 times) [2024-03-29 18:05:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 890060800. Throughput: 0: 42088.7. Samples: 772301200. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:05:13,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 18:05:14,604][00497] Updated weights for policy 0, policy_version 54328 (0.0030) [2024-03-29 18:05:18,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 890257408. Throughput: 0: 41811.7. Samples: 772424700. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:05:18,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 18:05:18,892][00497] Updated weights for policy 0, policy_version 54338 (0.0023) [2024-03-29 18:05:22,563][00497] Updated weights for policy 0, policy_version 54348 (0.0025) [2024-03-29 18:05:23,839][00126] Fps is (10 sec: 44237.7, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 890503168. Throughput: 0: 42081.3. Samples: 772683820. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:05:23,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 18:05:26,180][00497] Updated weights for policy 0, policy_version 54358 (0.0023) [2024-03-29 18:05:28,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 890683392. Throughput: 0: 41852.9. Samples: 772920320. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 18:05:28,842][00126] Avg episode reward: [(0, '0.647')] [2024-03-29 18:05:30,260][00497] Updated weights for policy 0, policy_version 54368 (0.0021) [2024-03-29 18:05:33,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42325.4, 300 sec: 41987.4). Total num frames: 890896384. Throughput: 0: 41687.4. Samples: 773048280. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 18:05:33,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:05:34,474][00497] Updated weights for policy 0, policy_version 54378 (0.0021) [2024-03-29 18:05:38,177][00497] Updated weights for policy 0, policy_version 54388 (0.0020) [2024-03-29 18:05:38,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 891125760. Throughput: 0: 42471.5. Samples: 773322560. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 18:05:38,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:05:41,954][00497] Updated weights for policy 0, policy_version 54398 (0.0031) [2024-03-29 18:05:42,954][00476] Signal inference workers to stop experience collection... (27500 times) [2024-03-29 18:05:42,954][00476] Signal inference workers to resume experience collection... (27500 times) [2024-03-29 18:05:42,998][00497] InferenceWorker_p0-w0: stopping experience collection (27500 times) [2024-03-29 18:05:42,999][00497] InferenceWorker_p0-w0: resuming experience collection (27500 times) [2024-03-29 18:05:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 891322368. Throughput: 0: 41796.3. Samples: 773553320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 18:05:43,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 18:05:45,782][00497] Updated weights for policy 0, policy_version 54408 (0.0024) [2024-03-29 18:05:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 891535360. Throughput: 0: 41817.0. Samples: 773685420. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 18:05:48,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:05:49,969][00497] Updated weights for policy 0, policy_version 54418 (0.0026) [2024-03-29 18:05:53,715][00497] Updated weights for policy 0, policy_version 54428 (0.0034) [2024-03-29 18:05:53,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 891748352. Throughput: 0: 42506.7. Samples: 773958260. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 18:05:53,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 18:05:57,415][00497] Updated weights for policy 0, policy_version 54438 (0.0026) [2024-03-29 18:05:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 891961344. Throughput: 0: 41853.1. Samples: 774184580. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 18:05:58,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 18:06:01,295][00497] Updated weights for policy 0, policy_version 54448 (0.0022) [2024-03-29 18:06:03,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 892157952. Throughput: 0: 42194.1. Samples: 774323440. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:03,841][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 18:06:05,447][00497] Updated weights for policy 0, policy_version 54458 (0.0026) [2024-03-29 18:06:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 892370944. Throughput: 0: 42166.2. Samples: 774581300. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:08,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 18:06:09,296][00497] Updated weights for policy 0, policy_version 54468 (0.0020) [2024-03-29 18:06:12,779][00497] Updated weights for policy 0, policy_version 54478 (0.0024) [2024-03-29 18:06:13,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42325.5, 300 sec: 41932.0). Total num frames: 892600320. Throughput: 0: 42292.1. Samples: 774823460. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:13,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 18:06:15,243][00476] Signal inference workers to stop experience collection... (27550 times) [2024-03-29 18:06:15,244][00476] Signal inference workers to resume experience collection... (27550 times) [2024-03-29 18:06:15,291][00497] InferenceWorker_p0-w0: stopping experience collection (27550 times) [2024-03-29 18:06:15,292][00497] InferenceWorker_p0-w0: resuming experience collection (27550 times) [2024-03-29 18:06:16,845][00497] Updated weights for policy 0, policy_version 54488 (0.0028) [2024-03-29 18:06:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 892796928. Throughput: 0: 42412.2. Samples: 774956820. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:18,840][00126] Avg episode reward: [(0, '0.676')] [2024-03-29 18:06:21,001][00497] Updated weights for policy 0, policy_version 54498 (0.0017) [2024-03-29 18:06:23,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 892993536. Throughput: 0: 41813.4. Samples: 775204160. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:23,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:06:25,065][00497] Updated weights for policy 0, policy_version 54508 (0.0022) [2024-03-29 18:06:28,471][00497] Updated weights for policy 0, policy_version 54518 (0.0018) [2024-03-29 18:06:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 893222912. Throughput: 0: 42006.8. Samples: 775443620. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:28,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 18:06:32,582][00497] Updated weights for policy 0, policy_version 54528 (0.0025) [2024-03-29 18:06:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 41987.4). Total num frames: 893435904. Throughput: 0: 42088.8. Samples: 775579420. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:06:33,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 18:06:36,704][00497] Updated weights for policy 0, policy_version 54538 (0.0018) [2024-03-29 18:06:38,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 893616128. Throughput: 0: 41747.8. Samples: 775836920. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:06:38,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 18:06:40,632][00497] Updated weights for policy 0, policy_version 54548 (0.0026) [2024-03-29 18:06:43,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42325.5, 300 sec: 41931.9). Total num frames: 893861888. Throughput: 0: 41789.8. Samples: 776065120. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:06:43,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:06:44,235][00497] Updated weights for policy 0, policy_version 54558 (0.0028) [2024-03-29 18:06:48,377][00497] Updated weights for policy 0, policy_version 54568 (0.0022) [2024-03-29 18:06:48,596][00476] Signal inference workers to stop experience collection... (27600 times) [2024-03-29 18:06:48,638][00497] InferenceWorker_p0-w0: stopping experience collection (27600 times) [2024-03-29 18:06:48,676][00476] Signal inference workers to resume experience collection... (27600 times) [2024-03-29 18:06:48,678][00497] InferenceWorker_p0-w0: resuming experience collection (27600 times) [2024-03-29 18:06:48,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 894058496. Throughput: 0: 41548.9. Samples: 776193140. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:06:48,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 18:06:52,618][00497] Updated weights for policy 0, policy_version 54578 (0.0019) [2024-03-29 18:06:53,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 894255104. Throughput: 0: 41772.8. Samples: 776461080. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:06:53,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:06:56,462][00497] Updated weights for policy 0, policy_version 54588 (0.0026) [2024-03-29 18:06:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 894484480. Throughput: 0: 41588.8. Samples: 776694960. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:06:58,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:06:59,910][00497] Updated weights for policy 0, policy_version 54598 (0.0022) [2024-03-29 18:07:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 894681088. Throughput: 0: 41443.0. Samples: 776821760. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:07:03,840][00126] Avg episode reward: [(0, '0.516')] [2024-03-29 18:07:04,051][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000054608_894697472.pth... [2024-03-29 18:07:04,064][00497] Updated weights for policy 0, policy_version 54608 (0.0027) [2024-03-29 18:07:04,365][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000053997_884686848.pth [2024-03-29 18:07:08,363][00497] Updated weights for policy 0, policy_version 54618 (0.0022) [2024-03-29 18:07:08,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 894877696. Throughput: 0: 41837.1. Samples: 777086820. Policy #0 lag: (min: 2.0, avg: 21.2, max: 42.0) [2024-03-29 18:07:08,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 18:07:12,199][00497] Updated weights for policy 0, policy_version 54628 (0.0028) [2024-03-29 18:07:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 895107072. Throughput: 0: 41688.8. Samples: 777319620. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:13,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:07:15,660][00497] Updated weights for policy 0, policy_version 54638 (0.0022) [2024-03-29 18:07:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 895287296. Throughput: 0: 41285.5. Samples: 777437260. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:18,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 18:07:19,990][00497] Updated weights for policy 0, policy_version 54648 (0.0024) [2024-03-29 18:07:23,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.3, 300 sec: 41876.4). Total num frames: 895483904. Throughput: 0: 41418.8. Samples: 777700760. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:23,840][00126] Avg episode reward: [(0, '0.632')] [2024-03-29 18:07:24,263][00497] Updated weights for policy 0, policy_version 54658 (0.0027) [2024-03-29 18:07:26,714][00476] Signal inference workers to stop experience collection... (27650 times) [2024-03-29 18:07:26,715][00476] Signal inference workers to resume experience collection... (27650 times) [2024-03-29 18:07:26,737][00497] InferenceWorker_p0-w0: stopping experience collection (27650 times) [2024-03-29 18:07:26,738][00497] InferenceWorker_p0-w0: resuming experience collection (27650 times) [2024-03-29 18:07:28,227][00497] Updated weights for policy 0, policy_version 54668 (0.0022) [2024-03-29 18:07:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 895713280. Throughput: 0: 41805.7. Samples: 777946380. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:28,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 18:07:31,406][00497] Updated weights for policy 0, policy_version 54678 (0.0028) [2024-03-29 18:07:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.3, 300 sec: 41820.9). Total num frames: 895926272. Throughput: 0: 41462.8. Samples: 778058960. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:33,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 18:07:35,819][00497] Updated weights for policy 0, policy_version 54688 (0.0027) [2024-03-29 18:07:38,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 896106496. Throughput: 0: 41075.6. Samples: 778309480. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:38,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:07:40,116][00497] Updated weights for policy 0, policy_version 54698 (0.0019) [2024-03-29 18:07:43,839][00126] Fps is (10 sec: 39320.9, 60 sec: 40959.9, 300 sec: 41654.2). Total num frames: 896319488. Throughput: 0: 41759.9. Samples: 778574160. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 18:07:43,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:07:44,034][00497] Updated weights for policy 0, policy_version 54708 (0.0021) [2024-03-29 18:07:47,169][00497] Updated weights for policy 0, policy_version 54718 (0.0022) [2024-03-29 18:07:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 896548864. Throughput: 0: 41270.4. Samples: 778678920. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:07:48,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 18:07:51,470][00497] Updated weights for policy 0, policy_version 54728 (0.0025) [2024-03-29 18:07:53,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 896745472. Throughput: 0: 41027.1. Samples: 778933040. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:07:53,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:07:56,015][00497] Updated weights for policy 0, policy_version 54738 (0.0018) [2024-03-29 18:07:58,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41487.6). Total num frames: 896925696. Throughput: 0: 41763.6. Samples: 779198980. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:07:58,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 18:07:59,177][00476] Signal inference workers to stop experience collection... (27700 times) [2024-03-29 18:07:59,215][00497] InferenceWorker_p0-w0: stopping experience collection (27700 times) [2024-03-29 18:07:59,405][00476] Signal inference workers to resume experience collection... (27700 times) [2024-03-29 18:07:59,406][00497] InferenceWorker_p0-w0: resuming experience collection (27700 times) [2024-03-29 18:07:59,746][00497] Updated weights for policy 0, policy_version 54748 (0.0028) [2024-03-29 18:08:02,915][00497] Updated weights for policy 0, policy_version 54758 (0.0029) [2024-03-29 18:08:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 897187840. Throughput: 0: 41488.8. Samples: 779304260. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:08:03,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:08:07,307][00497] Updated weights for policy 0, policy_version 54768 (0.0031) [2024-03-29 18:08:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 897351680. Throughput: 0: 41204.4. Samples: 779554960. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:08:08,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 18:08:11,859][00497] Updated weights for policy 0, policy_version 54778 (0.0021) [2024-03-29 18:08:13,839][00126] Fps is (10 sec: 36044.4, 60 sec: 40686.8, 300 sec: 41598.7). Total num frames: 897548288. Throughput: 0: 41493.2. Samples: 779813580. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:08:13,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 18:08:15,547][00497] Updated weights for policy 0, policy_version 54788 (0.0019) [2024-03-29 18:08:18,775][00497] Updated weights for policy 0, policy_version 54798 (0.0020) [2024-03-29 18:08:18,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 897810432. Throughput: 0: 41597.7. Samples: 779930860. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 18:08:18,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 18:08:23,194][00497] Updated weights for policy 0, policy_version 54808 (0.0019) [2024-03-29 18:08:23,839][00126] Fps is (10 sec: 45876.0, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 898007040. Throughput: 0: 41622.3. Samples: 780182480. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:23,840][00126] Avg episode reward: [(0, '0.674')] [2024-03-29 18:08:27,300][00497] Updated weights for policy 0, policy_version 54818 (0.0026) [2024-03-29 18:08:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 898187264. Throughput: 0: 41559.2. Samples: 780444320. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:28,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:08:31,118][00497] Updated weights for policy 0, policy_version 54828 (0.0024) [2024-03-29 18:08:31,705][00476] Signal inference workers to stop experience collection... (27750 times) [2024-03-29 18:08:31,742][00497] InferenceWorker_p0-w0: stopping experience collection (27750 times) [2024-03-29 18:08:31,933][00476] Signal inference workers to resume experience collection... (27750 times) [2024-03-29 18:08:31,933][00497] InferenceWorker_p0-w0: resuming experience collection (27750 times) [2024-03-29 18:08:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 898433024. Throughput: 0: 42072.8. Samples: 780572200. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:33,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 18:08:34,384][00497] Updated weights for policy 0, policy_version 54838 (0.0025) [2024-03-29 18:08:38,581][00497] Updated weights for policy 0, policy_version 54848 (0.0027) [2024-03-29 18:08:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 898629632. Throughput: 0: 41805.2. Samples: 780814280. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:38,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 18:08:42,737][00497] Updated weights for policy 0, policy_version 54858 (0.0027) [2024-03-29 18:08:43,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 898826240. Throughput: 0: 41780.9. Samples: 781079120. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:43,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:08:46,696][00497] Updated weights for policy 0, policy_version 54868 (0.0028) [2024-03-29 18:08:48,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 899072000. Throughput: 0: 42309.0. Samples: 781208160. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:48,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 18:08:49,852][00497] Updated weights for policy 0, policy_version 54878 (0.0020) [2024-03-29 18:08:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 899268608. Throughput: 0: 41882.3. Samples: 781439660. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 18:08:53,841][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:08:54,174][00497] Updated weights for policy 0, policy_version 54888 (0.0022) [2024-03-29 18:08:58,504][00497] Updated weights for policy 0, policy_version 54898 (0.0023) [2024-03-29 18:08:58,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 899448832. Throughput: 0: 42053.9. Samples: 781706000. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:08:58,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:09:02,312][00497] Updated weights for policy 0, policy_version 54908 (0.0020) [2024-03-29 18:09:03,839][00126] Fps is (10 sec: 40959.2, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 899678208. Throughput: 0: 42437.3. Samples: 781840540. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:09:03,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 18:09:04,269][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000054914_899710976.pth... [2024-03-29 18:09:04,612][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000054301_889667584.pth [2024-03-29 18:09:05,772][00497] Updated weights for policy 0, policy_version 54918 (0.0026) [2024-03-29 18:09:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 899874816. Throughput: 0: 41655.9. Samples: 782057000. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:09:08,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 18:09:09,541][00476] Signal inference workers to stop experience collection... (27800 times) [2024-03-29 18:09:09,612][00497] InferenceWorker_p0-w0: stopping experience collection (27800 times) [2024-03-29 18:09:09,619][00476] Signal inference workers to resume experience collection... (27800 times) [2024-03-29 18:09:09,643][00497] InferenceWorker_p0-w0: resuming experience collection (27800 times) [2024-03-29 18:09:09,929][00497] Updated weights for policy 0, policy_version 54928 (0.0024) [2024-03-29 18:09:13,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 900071424. Throughput: 0: 41789.6. Samples: 782324860. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:09:13,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 18:09:14,359][00497] Updated weights for policy 0, policy_version 54938 (0.0029) [2024-03-29 18:09:18,206][00497] Updated weights for policy 0, policy_version 54948 (0.0028) [2024-03-29 18:09:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 900300800. Throughput: 0: 41800.0. Samples: 782453200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:09:18,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 18:09:21,672][00497] Updated weights for policy 0, policy_version 54958 (0.0024) [2024-03-29 18:09:23,839][00126] Fps is (10 sec: 44237.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 900513792. Throughput: 0: 41347.7. Samples: 782674920. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:09:23,841][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:09:25,909][00497] Updated weights for policy 0, policy_version 54968 (0.0028) [2024-03-29 18:09:28,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 900694016. Throughput: 0: 41357.2. Samples: 782940200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 18:09:28,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 18:09:30,108][00497] Updated weights for policy 0, policy_version 54978 (0.0023) [2024-03-29 18:09:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 900890624. Throughput: 0: 41228.9. Samples: 783063460. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 18:09:33,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:09:34,176][00497] Updated weights for policy 0, policy_version 54988 (0.0020) [2024-03-29 18:09:37,380][00497] Updated weights for policy 0, policy_version 54998 (0.0019) [2024-03-29 18:09:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 901136384. Throughput: 0: 41353.6. Samples: 783300580. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 18:09:38,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:09:41,611][00497] Updated weights for policy 0, policy_version 55008 (0.0023) [2024-03-29 18:09:43,841][00126] Fps is (10 sec: 42590.1, 60 sec: 41504.8, 300 sec: 41709.5). Total num frames: 901316608. Throughput: 0: 41267.6. Samples: 783563120. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 18:09:43,842][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:09:45,807][00497] Updated weights for policy 0, policy_version 55018 (0.0030) [2024-03-29 18:09:46,237][00476] Signal inference workers to stop experience collection... (27850 times) [2024-03-29 18:09:46,237][00476] Signal inference workers to resume experience collection... (27850 times) [2024-03-29 18:09:46,271][00497] InferenceWorker_p0-w0: stopping experience collection (27850 times) [2024-03-29 18:09:46,271][00497] InferenceWorker_p0-w0: resuming experience collection (27850 times) [2024-03-29 18:09:48,839][00126] Fps is (10 sec: 39322.2, 60 sec: 40960.0, 300 sec: 41654.3). Total num frames: 901529600. Throughput: 0: 41255.7. Samples: 783697040. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 18:09:48,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 18:09:49,748][00497] Updated weights for policy 0, policy_version 55028 (0.0026) [2024-03-29 18:09:52,919][00497] Updated weights for policy 0, policy_version 55038 (0.0032) [2024-03-29 18:09:53,839][00126] Fps is (10 sec: 47522.8, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 901791744. Throughput: 0: 41822.3. Samples: 783939000. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 18:09:53,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:09:56,997][00497] Updated weights for policy 0, policy_version 55048 (0.0019) [2024-03-29 18:09:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 901955584. Throughput: 0: 41527.2. Samples: 784193580. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 18:09:58,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 18:10:01,582][00497] Updated weights for policy 0, policy_version 55058 (0.0029) [2024-03-29 18:10:03,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.2, 300 sec: 41654.2). Total num frames: 902152192. Throughput: 0: 41550.7. Samples: 784322980. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:03,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 18:10:05,453][00497] Updated weights for policy 0, policy_version 55068 (0.0022) [2024-03-29 18:10:08,599][00497] Updated weights for policy 0, policy_version 55078 (0.0021) [2024-03-29 18:10:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 902397952. Throughput: 0: 42160.0. Samples: 784572120. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:08,840][00126] Avg episode reward: [(0, '0.638')] [2024-03-29 18:10:12,897][00497] Updated weights for policy 0, policy_version 55088 (0.0028) [2024-03-29 18:10:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 902594560. Throughput: 0: 41656.2. Samples: 784814720. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:13,840][00126] Avg episode reward: [(0, '0.635')] [2024-03-29 18:10:17,108][00497] Updated weights for policy 0, policy_version 55098 (0.0034) [2024-03-29 18:10:18,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 902774784. Throughput: 0: 42002.6. Samples: 784953580. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:18,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:10:21,024][00497] Updated weights for policy 0, policy_version 55108 (0.0018) [2024-03-29 18:10:21,389][00476] Signal inference workers to stop experience collection... (27900 times) [2024-03-29 18:10:21,447][00497] InferenceWorker_p0-w0: stopping experience collection (27900 times) [2024-03-29 18:10:21,479][00476] Signal inference workers to resume experience collection... (27900 times) [2024-03-29 18:10:21,481][00497] InferenceWorker_p0-w0: resuming experience collection (27900 times) [2024-03-29 18:10:23,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 903036928. Throughput: 0: 42182.3. Samples: 785198780. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:23,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:10:24,372][00497] Updated weights for policy 0, policy_version 55118 (0.0033) [2024-03-29 18:10:28,619][00497] Updated weights for policy 0, policy_version 55128 (0.0024) [2024-03-29 18:10:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 903217152. Throughput: 0: 41848.0. Samples: 785446200. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:28,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 18:10:32,618][00497] Updated weights for policy 0, policy_version 55138 (0.0022) [2024-03-29 18:10:33,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 903430144. Throughput: 0: 41850.2. Samples: 785580300. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 18:10:33,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:10:36,537][00497] Updated weights for policy 0, policy_version 55148 (0.0024) [2024-03-29 18:10:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 903659520. Throughput: 0: 42292.8. Samples: 785842180. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:10:38,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 18:10:39,693][00497] Updated weights for policy 0, policy_version 55158 (0.0030) [2024-03-29 18:10:43,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42326.6, 300 sec: 41765.3). Total num frames: 903856128. Throughput: 0: 42165.2. Samples: 786091020. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:10:43,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 18:10:43,942][00497] Updated weights for policy 0, policy_version 55168 (0.0027) [2024-03-29 18:10:47,843][00497] Updated weights for policy 0, policy_version 55178 (0.0023) [2024-03-29 18:10:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 904069120. Throughput: 0: 42229.7. Samples: 786223320. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:10:48,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 18:10:51,895][00497] Updated weights for policy 0, policy_version 55188 (0.0018) [2024-03-29 18:10:53,146][00476] Signal inference workers to stop experience collection... (27950 times) [2024-03-29 18:10:53,176][00497] InferenceWorker_p0-w0: stopping experience collection (27950 times) [2024-03-29 18:10:53,342][00476] Signal inference workers to resume experience collection... (27950 times) [2024-03-29 18:10:53,343][00497] InferenceWorker_p0-w0: resuming experience collection (27950 times) [2024-03-29 18:10:53,839][00126] Fps is (10 sec: 44237.5, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 904298496. Throughput: 0: 42441.7. Samples: 786482000. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:10:53,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 18:10:55,009][00497] Updated weights for policy 0, policy_version 55198 (0.0025) [2024-03-29 18:10:58,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 904511488. Throughput: 0: 42459.1. Samples: 786725380. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:10:58,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 18:10:59,293][00497] Updated weights for policy 0, policy_version 55208 (0.0018) [2024-03-29 18:11:03,252][00497] Updated weights for policy 0, policy_version 55218 (0.0027) [2024-03-29 18:11:03,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 904708096. Throughput: 0: 42346.2. Samples: 786859160. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:11:03,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 18:11:04,003][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000055220_904724480.pth... [2024-03-29 18:11:04,380][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000054608_894697472.pth [2024-03-29 18:11:07,246][00497] Updated weights for policy 0, policy_version 55228 (0.0028) [2024-03-29 18:11:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 904921088. Throughput: 0: 42817.8. Samples: 787125580. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:11:08,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 18:11:10,598][00497] Updated weights for policy 0, policy_version 55238 (0.0018) [2024-03-29 18:11:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 41820.8). Total num frames: 905134080. Throughput: 0: 42450.1. Samples: 787356460. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:13,841][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:11:14,949][00497] Updated weights for policy 0, policy_version 55248 (0.0031) [2024-03-29 18:11:18,802][00497] Updated weights for policy 0, policy_version 55258 (0.0019) [2024-03-29 18:11:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42871.5, 300 sec: 41876.4). Total num frames: 905347072. Throughput: 0: 42355.5. Samples: 787486300. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:18,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 18:11:22,820][00497] Updated weights for policy 0, policy_version 55268 (0.0018) [2024-03-29 18:11:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 905560064. Throughput: 0: 42443.0. Samples: 787752120. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:23,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:11:26,174][00497] Updated weights for policy 0, policy_version 55278 (0.0033) [2024-03-29 18:11:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 905756672. Throughput: 0: 42136.2. Samples: 787987140. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:28,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 18:11:28,909][00476] Signal inference workers to stop experience collection... (28000 times) [2024-03-29 18:11:28,910][00476] Signal inference workers to resume experience collection... (28000 times) [2024-03-29 18:11:28,955][00497] InferenceWorker_p0-w0: stopping experience collection (28000 times) [2024-03-29 18:11:28,956][00497] InferenceWorker_p0-w0: resuming experience collection (28000 times) [2024-03-29 18:11:30,575][00497] Updated weights for policy 0, policy_version 55288 (0.0022) [2024-03-29 18:11:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 905969664. Throughput: 0: 42166.6. Samples: 788120820. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:33,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:11:34,303][00497] Updated weights for policy 0, policy_version 55298 (0.0024) [2024-03-29 18:11:38,352][00497] Updated weights for policy 0, policy_version 55308 (0.0026) [2024-03-29 18:11:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 906182656. Throughput: 0: 42388.0. Samples: 788389460. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:38,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:11:41,456][00497] Updated weights for policy 0, policy_version 55318 (0.0023) [2024-03-29 18:11:43,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 906412032. Throughput: 0: 42270.7. Samples: 788627560. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:11:43,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 18:11:46,044][00497] Updated weights for policy 0, policy_version 55328 (0.0019) [2024-03-29 18:11:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 906608640. Throughput: 0: 42420.5. Samples: 788768080. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:11:48,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 18:11:49,815][00497] Updated weights for policy 0, policy_version 55338 (0.0023) [2024-03-29 18:11:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 906805248. Throughput: 0: 42174.2. Samples: 789023420. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:11:53,840][00126] Avg episode reward: [(0, '0.480')] [2024-03-29 18:11:54,005][00497] Updated weights for policy 0, policy_version 55348 (0.0022) [2024-03-29 18:11:57,070][00497] Updated weights for policy 0, policy_version 55358 (0.0019) [2024-03-29 18:11:57,344][00476] Signal inference workers to stop experience collection... (28050 times) [2024-03-29 18:11:57,379][00497] InferenceWorker_p0-w0: stopping experience collection (28050 times) [2024-03-29 18:11:57,569][00476] Signal inference workers to resume experience collection... (28050 times) [2024-03-29 18:11:57,570][00497] InferenceWorker_p0-w0: resuming experience collection (28050 times) [2024-03-29 18:11:58,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 41931.9). Total num frames: 907051008. Throughput: 0: 41952.0. Samples: 789244300. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:11:58,841][00126] Avg episode reward: [(0, '0.645')] [2024-03-29 18:12:01,547][00497] Updated weights for policy 0, policy_version 55368 (0.0024) [2024-03-29 18:12:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 907247616. Throughput: 0: 42375.5. Samples: 789393200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:12:03,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:12:05,359][00497] Updated weights for policy 0, policy_version 55378 (0.0025) [2024-03-29 18:12:08,839][00126] Fps is (10 sec: 39322.4, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 907444224. Throughput: 0: 42379.3. Samples: 789659180. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:12:08,841][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:12:09,275][00497] Updated weights for policy 0, policy_version 55388 (0.0020) [2024-03-29 18:12:12,555][00497] Updated weights for policy 0, policy_version 55398 (0.0028) [2024-03-29 18:12:13,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 907689984. Throughput: 0: 42078.1. Samples: 789880660. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:12:13,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 18:12:17,227][00497] Updated weights for policy 0, policy_version 55408 (0.0023) [2024-03-29 18:12:18,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 907870208. Throughput: 0: 42375.6. Samples: 790027720. Policy #0 lag: (min: 0.0, avg: 21.4, max: 43.0) [2024-03-29 18:12:18,840][00126] Avg episode reward: [(0, '0.625')] [2024-03-29 18:12:20,947][00497] Updated weights for policy 0, policy_version 55418 (0.0030) [2024-03-29 18:12:23,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 908083200. Throughput: 0: 42122.2. Samples: 790284960. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:23,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 18:12:24,954][00497] Updated weights for policy 0, policy_version 55428 (0.0020) [2024-03-29 18:12:28,160][00497] Updated weights for policy 0, policy_version 55438 (0.0020) [2024-03-29 18:12:28,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 908328960. Throughput: 0: 42096.4. Samples: 790521900. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:28,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 18:12:32,853][00497] Updated weights for policy 0, policy_version 55448 (0.0021) [2024-03-29 18:12:33,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 908509184. Throughput: 0: 41928.8. Samples: 790654880. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:33,840][00126] Avg episode reward: [(0, '0.659')] [2024-03-29 18:12:36,523][00497] Updated weights for policy 0, policy_version 55458 (0.0019) [2024-03-29 18:12:38,815][00476] Signal inference workers to stop experience collection... (28100 times) [2024-03-29 18:12:38,816][00476] Signal inference workers to resume experience collection... (28100 times) [2024-03-29 18:12:38,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 908705792. Throughput: 0: 41716.3. Samples: 790900660. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:38,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:12:38,854][00497] InferenceWorker_p0-w0: stopping experience collection (28100 times) [2024-03-29 18:12:38,854][00497] InferenceWorker_p0-w0: resuming experience collection (28100 times) [2024-03-29 18:12:40,766][00497] Updated weights for policy 0, policy_version 55468 (0.0023) [2024-03-29 18:12:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 908935168. Throughput: 0: 42184.1. Samples: 791142580. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:43,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:12:43,902][00497] Updated weights for policy 0, policy_version 55478 (0.0027) [2024-03-29 18:12:48,405][00497] Updated weights for policy 0, policy_version 55488 (0.0025) [2024-03-29 18:12:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 909131776. Throughput: 0: 41761.8. Samples: 791272480. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:48,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:12:52,061][00497] Updated weights for policy 0, policy_version 55498 (0.0023) [2024-03-29 18:12:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 909328384. Throughput: 0: 41544.8. Samples: 791528700. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 18:12:53,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:12:56,311][00497] Updated weights for policy 0, policy_version 55508 (0.0028) [2024-03-29 18:12:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 909574144. Throughput: 0: 42316.1. Samples: 791784880. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:12:58,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 18:12:59,407][00497] Updated weights for policy 0, policy_version 55518 (0.0019) [2024-03-29 18:13:03,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 909754368. Throughput: 0: 41551.2. Samples: 791897520. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:13:03,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:13:03,892][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000055528_909770752.pth... [2024-03-29 18:13:03,917][00497] Updated weights for policy 0, policy_version 55528 (0.0022) [2024-03-29 18:13:04,197][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000054914_899710976.pth [2024-03-29 18:13:07,930][00497] Updated weights for policy 0, policy_version 55538 (0.0019) [2024-03-29 18:13:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 909967360. Throughput: 0: 41630.2. Samples: 792158320. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:13:08,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:13:12,087][00497] Updated weights for policy 0, policy_version 55548 (0.0024) [2024-03-29 18:13:13,660][00476] Signal inference workers to stop experience collection... (28150 times) [2024-03-29 18:13:13,695][00497] InferenceWorker_p0-w0: stopping experience collection (28150 times) [2024-03-29 18:13:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.3, 300 sec: 41932.0). Total num frames: 910180352. Throughput: 0: 41996.9. Samples: 792411760. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:13:13,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:13:13,886][00476] Signal inference workers to resume experience collection... (28150 times) [2024-03-29 18:13:13,887][00497] InferenceWorker_p0-w0: resuming experience collection (28150 times) [2024-03-29 18:13:15,186][00497] Updated weights for policy 0, policy_version 55558 (0.0026) [2024-03-29 18:13:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 910376960. Throughput: 0: 41337.5. Samples: 792515060. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:13:18,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:13:19,824][00497] Updated weights for policy 0, policy_version 55568 (0.0024) [2024-03-29 18:13:23,571][00497] Updated weights for policy 0, policy_version 55578 (0.0019) [2024-03-29 18:13:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 910589952. Throughput: 0: 41801.8. Samples: 792781740. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:13:23,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:13:27,859][00497] Updated weights for policy 0, policy_version 55588 (0.0022) [2024-03-29 18:13:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 910802944. Throughput: 0: 42396.1. Samples: 793050400. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 18:13:28,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:13:30,791][00497] Updated weights for policy 0, policy_version 55598 (0.0037) [2024-03-29 18:13:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 911015936. Throughput: 0: 41866.6. Samples: 793156480. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:13:33,841][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:13:35,259][00497] Updated weights for policy 0, policy_version 55608 (0.0027) [2024-03-29 18:13:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 911228928. Throughput: 0: 42045.0. Samples: 793420720. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:13:38,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:13:38,941][00497] Updated weights for policy 0, policy_version 55618 (0.0023) [2024-03-29 18:13:43,167][00497] Updated weights for policy 0, policy_version 55628 (0.0027) [2024-03-29 18:13:43,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 911441920. Throughput: 0: 42313.4. Samples: 793688980. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:13:43,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 18:13:46,172][00497] Updated weights for policy 0, policy_version 55638 (0.0034) [2024-03-29 18:13:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 911654912. Throughput: 0: 42038.2. Samples: 793789240. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:13:48,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 18:13:49,089][00476] Signal inference workers to stop experience collection... (28200 times) [2024-03-29 18:13:49,128][00497] InferenceWorker_p0-w0: stopping experience collection (28200 times) [2024-03-29 18:13:49,176][00476] Signal inference workers to resume experience collection... (28200 times) [2024-03-29 18:13:49,177][00497] InferenceWorker_p0-w0: resuming experience collection (28200 times) [2024-03-29 18:13:50,553][00497] Updated weights for policy 0, policy_version 55648 (0.0021) [2024-03-29 18:13:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 911867904. Throughput: 0: 42276.5. Samples: 794060760. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:13:53,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:13:54,311][00497] Updated weights for policy 0, policy_version 55658 (0.0019) [2024-03-29 18:13:58,428][00497] Updated weights for policy 0, policy_version 55668 (0.0028) [2024-03-29 18:13:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 912064512. Throughput: 0: 42477.7. Samples: 794323260. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:13:58,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 18:14:01,469][00497] Updated weights for policy 0, policy_version 55678 (0.0036) [2024-03-29 18:14:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 912310272. Throughput: 0: 42631.9. Samples: 794433500. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 18:14:03,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:14:05,991][00497] Updated weights for policy 0, policy_version 55688 (0.0022) [2024-03-29 18:14:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 912506880. Throughput: 0: 42628.5. Samples: 794700020. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:08,840][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 18:14:09,937][00497] Updated weights for policy 0, policy_version 55698 (0.0020) [2024-03-29 18:14:13,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 912703488. Throughput: 0: 42320.8. Samples: 794954840. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:13,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 18:14:14,222][00497] Updated weights for policy 0, policy_version 55708 (0.0028) [2024-03-29 18:14:17,199][00497] Updated weights for policy 0, policy_version 55718 (0.0023) [2024-03-29 18:14:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 912932864. Throughput: 0: 42410.2. Samples: 795064940. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:18,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 18:14:21,666][00497] Updated weights for policy 0, policy_version 55728 (0.0027) [2024-03-29 18:14:23,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.5, 300 sec: 42209.7). Total num frames: 913145856. Throughput: 0: 42442.7. Samples: 795330640. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:23,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:14:25,572][00497] Updated weights for policy 0, policy_version 55738 (0.0033) [2024-03-29 18:14:25,582][00476] Signal inference workers to stop experience collection... (28250 times) [2024-03-29 18:14:25,583][00476] Signal inference workers to resume experience collection... (28250 times) [2024-03-29 18:14:25,622][00497] InferenceWorker_p0-w0: stopping experience collection (28250 times) [2024-03-29 18:14:25,622][00497] InferenceWorker_p0-w0: resuming experience collection (28250 times) [2024-03-29 18:14:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 913342464. Throughput: 0: 42160.0. Samples: 795586180. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:28,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 18:14:29,774][00497] Updated weights for policy 0, policy_version 55748 (0.0024) [2024-03-29 18:14:32,606][00497] Updated weights for policy 0, policy_version 55758 (0.0025) [2024-03-29 18:14:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 913571840. Throughput: 0: 42707.1. Samples: 795711060. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:33,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:14:36,924][00497] Updated weights for policy 0, policy_version 55768 (0.0019) [2024-03-29 18:14:38,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.3, 300 sec: 42265.4). Total num frames: 913784832. Throughput: 0: 42325.3. Samples: 795965400. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:14:38,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 18:14:40,898][00497] Updated weights for policy 0, policy_version 55778 (0.0018) [2024-03-29 18:14:43,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 913981440. Throughput: 0: 42275.2. Samples: 796225640. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:14:43,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 18:14:45,319][00497] Updated weights for policy 0, policy_version 55788 (0.0023) [2024-03-29 18:14:48,247][00497] Updated weights for policy 0, policy_version 55798 (0.0029) [2024-03-29 18:14:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 914210816. Throughput: 0: 42641.4. Samples: 796352360. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:14:48,840][00126] Avg episode reward: [(0, '0.655')] [2024-03-29 18:14:52,647][00497] Updated weights for policy 0, policy_version 55808 (0.0018) [2024-03-29 18:14:53,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 914391040. Throughput: 0: 41986.2. Samples: 796589400. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:14:53,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 18:14:56,527][00497] Updated weights for policy 0, policy_version 55818 (0.0018) [2024-03-29 18:14:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 914604032. Throughput: 0: 42131.1. Samples: 796850740. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:14:58,840][00126] Avg episode reward: [(0, '0.680')] [2024-03-29 18:15:00,953][00497] Updated weights for policy 0, policy_version 55828 (0.0023) [2024-03-29 18:15:02,492][00476] Signal inference workers to stop experience collection... (28300 times) [2024-03-29 18:15:02,524][00497] InferenceWorker_p0-w0: stopping experience collection (28300 times) [2024-03-29 18:15:02,712][00476] Signal inference workers to resume experience collection... (28300 times) [2024-03-29 18:15:02,712][00497] InferenceWorker_p0-w0: resuming experience collection (28300 times) [2024-03-29 18:15:03,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 914833408. Throughput: 0: 42566.7. Samples: 796980440. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:15:03,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 18:15:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000055838_914849792.pth... [2024-03-29 18:15:03,876][00497] Updated weights for policy 0, policy_version 55838 (0.0023) [2024-03-29 18:15:04,200][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000055220_904724480.pth [2024-03-29 18:15:08,450][00497] Updated weights for policy 0, policy_version 55848 (0.0023) [2024-03-29 18:15:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 915030016. Throughput: 0: 41911.4. Samples: 797216660. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:15:08,841][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 18:15:12,274][00497] Updated weights for policy 0, policy_version 55858 (0.0022) [2024-03-29 18:15:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 915243008. Throughput: 0: 41973.7. Samples: 797475000. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 18:15:13,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 18:15:16,640][00497] Updated weights for policy 0, policy_version 55868 (0.0019) [2024-03-29 18:15:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 915456000. Throughput: 0: 41863.1. Samples: 797594900. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:18,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 18:15:19,737][00497] Updated weights for policy 0, policy_version 55878 (0.0023) [2024-03-29 18:15:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 42098.5). Total num frames: 915636224. Throughput: 0: 41721.7. Samples: 797842880. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:23,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 18:15:24,301][00497] Updated weights for policy 0, policy_version 55888 (0.0022) [2024-03-29 18:15:28,028][00497] Updated weights for policy 0, policy_version 55898 (0.0020) [2024-03-29 18:15:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 915865600. Throughput: 0: 41561.7. Samples: 798095920. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:28,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 18:15:31,971][00497] Updated weights for policy 0, policy_version 55908 (0.0033) [2024-03-29 18:15:33,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 916078592. Throughput: 0: 41661.8. Samples: 798227140. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:33,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:15:35,303][00497] Updated weights for policy 0, policy_version 55918 (0.0022) [2024-03-29 18:15:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 916275200. Throughput: 0: 41725.4. Samples: 798467040. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:38,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 18:15:39,842][00497] Updated weights for policy 0, policy_version 55928 (0.0031) [2024-03-29 18:15:40,042][00476] Signal inference workers to stop experience collection... (28350 times) [2024-03-29 18:15:40,067][00497] InferenceWorker_p0-w0: stopping experience collection (28350 times) [2024-03-29 18:15:40,220][00476] Signal inference workers to resume experience collection... (28350 times) [2024-03-29 18:15:40,220][00497] InferenceWorker_p0-w0: resuming experience collection (28350 times) [2024-03-29 18:15:43,630][00497] Updated weights for policy 0, policy_version 55938 (0.0025) [2024-03-29 18:15:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 42098.6). Total num frames: 916488192. Throughput: 0: 41670.2. Samples: 798725900. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:43,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 18:15:47,604][00497] Updated weights for policy 0, policy_version 55948 (0.0018) [2024-03-29 18:15:48,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 916701184. Throughput: 0: 41728.6. Samples: 798858220. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 18:15:48,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 18:15:50,882][00497] Updated weights for policy 0, policy_version 55958 (0.0020) [2024-03-29 18:15:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 916914176. Throughput: 0: 41831.2. Samples: 799099060. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:15:53,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 18:15:55,265][00497] Updated weights for policy 0, policy_version 55968 (0.0023) [2024-03-29 18:15:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 917127168. Throughput: 0: 41958.3. Samples: 799363120. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:15:58,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 18:15:58,868][00497] Updated weights for policy 0, policy_version 55978 (0.0022) [2024-03-29 18:16:02,883][00497] Updated weights for policy 0, policy_version 55988 (0.0027) [2024-03-29 18:16:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 917340160. Throughput: 0: 42347.5. Samples: 799500540. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:16:03,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 18:16:06,371][00497] Updated weights for policy 0, policy_version 55998 (0.0033) [2024-03-29 18:16:08,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 917553152. Throughput: 0: 42116.5. Samples: 799738120. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:16:08,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 18:16:10,640][00497] Updated weights for policy 0, policy_version 56008 (0.0022) [2024-03-29 18:16:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 917766144. Throughput: 0: 42418.6. Samples: 800004760. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:16:13,840][00126] Avg episode reward: [(0, '0.657')] [2024-03-29 18:16:14,226][00497] Updated weights for policy 0, policy_version 56018 (0.0031) [2024-03-29 18:16:14,549][00476] Signal inference workers to stop experience collection... (28400 times) [2024-03-29 18:16:14,580][00497] InferenceWorker_p0-w0: stopping experience collection (28400 times) [2024-03-29 18:16:14,750][00476] Signal inference workers to resume experience collection... (28400 times) [2024-03-29 18:16:14,750][00497] InferenceWorker_p0-w0: resuming experience collection (28400 times) [2024-03-29 18:16:18,261][00497] Updated weights for policy 0, policy_version 56028 (0.0024) [2024-03-29 18:16:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 917962752. Throughput: 0: 42421.8. Samples: 800136120. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:16:18,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 18:16:21,642][00497] Updated weights for policy 0, policy_version 56038 (0.0034) [2024-03-29 18:16:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 918192128. Throughput: 0: 42366.2. Samples: 800373520. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 18:16:23,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:16:26,124][00497] Updated weights for policy 0, policy_version 56048 (0.0020) [2024-03-29 18:16:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 918405120. Throughput: 0: 42637.3. Samples: 800644580. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:28,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 18:16:29,676][00497] Updated weights for policy 0, policy_version 56058 (0.0019) [2024-03-29 18:16:33,507][00497] Updated weights for policy 0, policy_version 56068 (0.0026) [2024-03-29 18:16:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 918618112. Throughput: 0: 42620.3. Samples: 800776140. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:33,840][00126] Avg episode reward: [(0, '0.495')] [2024-03-29 18:16:36,807][00497] Updated weights for policy 0, policy_version 56078 (0.0025) [2024-03-29 18:16:38,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 918847488. Throughput: 0: 42588.9. Samples: 801015560. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:38,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 18:16:41,477][00497] Updated weights for policy 0, policy_version 56088 (0.0018) [2024-03-29 18:16:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 919027712. Throughput: 0: 42540.4. Samples: 801277440. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:43,840][00126] Avg episode reward: [(0, '0.634')] [2024-03-29 18:16:45,202][00497] Updated weights for policy 0, policy_version 56098 (0.0024) [2024-03-29 18:16:48,095][00476] Signal inference workers to stop experience collection... (28450 times) [2024-03-29 18:16:48,139][00497] InferenceWorker_p0-w0: stopping experience collection (28450 times) [2024-03-29 18:16:48,177][00476] Signal inference workers to resume experience collection... (28450 times) [2024-03-29 18:16:48,179][00497] InferenceWorker_p0-w0: resuming experience collection (28450 times) [2024-03-29 18:16:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 919257088. Throughput: 0: 42174.4. Samples: 801398380. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:48,840][00126] Avg episode reward: [(0, '0.633')] [2024-03-29 18:16:49,050][00497] Updated weights for policy 0, policy_version 56108 (0.0027) [2024-03-29 18:16:52,550][00497] Updated weights for policy 0, policy_version 56118 (0.0019) [2024-03-29 18:16:53,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 919470080. Throughput: 0: 42557.8. Samples: 801653220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:53,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:16:57,183][00497] Updated weights for policy 0, policy_version 56128 (0.0022) [2024-03-29 18:16:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 919666688. Throughput: 0: 42445.0. Samples: 801914780. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:16:58,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 18:17:00,795][00497] Updated weights for policy 0, policy_version 56138 (0.0023) [2024-03-29 18:17:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 919896064. Throughput: 0: 42320.3. Samples: 802040540. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:03,842][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 18:17:04,002][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000056147_919912448.pth... [2024-03-29 18:17:04,313][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000055528_909770752.pth [2024-03-29 18:17:04,652][00497] Updated weights for policy 0, policy_version 56148 (0.0026) [2024-03-29 18:17:07,976][00497] Updated weights for policy 0, policy_version 56158 (0.0030) [2024-03-29 18:17:08,840][00126] Fps is (10 sec: 45872.7, 60 sec: 42871.2, 300 sec: 42154.0). Total num frames: 920125440. Throughput: 0: 42704.5. Samples: 802295240. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:08,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 18:17:12,611][00497] Updated weights for policy 0, policy_version 56168 (0.0021) [2024-03-29 18:17:13,839][00126] Fps is (10 sec: 40960.8, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 920305664. Throughput: 0: 42406.3. Samples: 802552860. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:13,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 18:17:16,186][00497] Updated weights for policy 0, policy_version 56178 (0.0022) [2024-03-29 18:17:18,839][00126] Fps is (10 sec: 40962.0, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 920535040. Throughput: 0: 42366.3. Samples: 802682620. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:18,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:17:19,715][00497] Updated weights for policy 0, policy_version 56188 (0.0018) [2024-03-29 18:17:23,280][00497] Updated weights for policy 0, policy_version 56198 (0.0021) [2024-03-29 18:17:23,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42871.5, 300 sec: 42154.1). Total num frames: 920764416. Throughput: 0: 42786.3. Samples: 802940940. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:23,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:17:27,812][00497] Updated weights for policy 0, policy_version 56208 (0.0030) [2024-03-29 18:17:27,814][00476] Signal inference workers to stop experience collection... (28500 times) [2024-03-29 18:17:27,815][00476] Signal inference workers to resume experience collection... (28500 times) [2024-03-29 18:17:27,858][00497] InferenceWorker_p0-w0: stopping experience collection (28500 times) [2024-03-29 18:17:27,859][00497] InferenceWorker_p0-w0: resuming experience collection (28500 times) [2024-03-29 18:17:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 920961024. Throughput: 0: 42687.1. Samples: 803198360. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:28,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 18:17:31,517][00497] Updated weights for policy 0, policy_version 56218 (0.0019) [2024-03-29 18:17:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 921174016. Throughput: 0: 42675.0. Samples: 803318760. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:17:33,841][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 18:17:35,385][00497] Updated weights for policy 0, policy_version 56228 (0.0026) [2024-03-29 18:17:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 921387008. Throughput: 0: 42605.0. Samples: 803570440. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:17:38,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 18:17:39,023][00497] Updated weights for policy 0, policy_version 56238 (0.0023) [2024-03-29 18:17:43,632][00497] Updated weights for policy 0, policy_version 56248 (0.0020) [2024-03-29 18:17:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 921567232. Throughput: 0: 42266.1. Samples: 803816760. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:17:43,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 18:17:47,297][00497] Updated weights for policy 0, policy_version 56258 (0.0022) [2024-03-29 18:17:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 921796608. Throughput: 0: 42372.2. Samples: 803947280. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:17:48,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 18:17:50,994][00497] Updated weights for policy 0, policy_version 56268 (0.0022) [2024-03-29 18:17:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 921993216. Throughput: 0: 42037.3. Samples: 804186900. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:17:53,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 18:17:54,863][00497] Updated weights for policy 0, policy_version 56278 (0.0025) [2024-03-29 18:17:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 922189824. Throughput: 0: 41855.1. Samples: 804436340. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:17:58,840][00126] Avg episode reward: [(0, '0.492')] [2024-03-29 18:17:59,218][00497] Updated weights for policy 0, policy_version 56288 (0.0023) [2024-03-29 18:18:02,918][00497] Updated weights for policy 0, policy_version 56298 (0.0022) [2024-03-29 18:18:03,441][00476] Signal inference workers to stop experience collection... (28550 times) [2024-03-29 18:18:03,473][00497] InferenceWorker_p0-w0: stopping experience collection (28550 times) [2024-03-29 18:18:03,636][00476] Signal inference workers to resume experience collection... (28550 times) [2024-03-29 18:18:03,637][00497] InferenceWorker_p0-w0: resuming experience collection (28550 times) [2024-03-29 18:18:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 922419200. Throughput: 0: 41836.7. Samples: 804565280. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:18:03,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 18:18:06,568][00497] Updated weights for policy 0, policy_version 56308 (0.0037) [2024-03-29 18:18:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.4, 300 sec: 42154.1). Total num frames: 922615808. Throughput: 0: 41605.7. Samples: 804813200. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 18:18:08,842][00126] Avg episode reward: [(0, '0.618')] [2024-03-29 18:18:10,595][00497] Updated weights for policy 0, policy_version 56318 (0.0022) [2024-03-29 18:18:13,839][00126] Fps is (10 sec: 40960.8, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 922828800. Throughput: 0: 41505.0. Samples: 805066080. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:13,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:18:14,906][00497] Updated weights for policy 0, policy_version 56328 (0.0026) [2024-03-29 18:18:18,420][00497] Updated weights for policy 0, policy_version 56338 (0.0027) [2024-03-29 18:18:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 923041792. Throughput: 0: 41721.0. Samples: 805196200. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:18,840][00126] Avg episode reward: [(0, '0.636')] [2024-03-29 18:18:22,049][00497] Updated weights for policy 0, policy_version 56348 (0.0031) [2024-03-29 18:18:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 923254784. Throughput: 0: 41832.8. Samples: 805452920. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:23,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 18:18:25,905][00497] Updated weights for policy 0, policy_version 56358 (0.0020) [2024-03-29 18:18:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 923484160. Throughput: 0: 42023.2. Samples: 805707800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:28,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 18:18:30,312][00497] Updated weights for policy 0, policy_version 56368 (0.0030) [2024-03-29 18:18:33,838][00497] Updated weights for policy 0, policy_version 56378 (0.0024) [2024-03-29 18:18:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 923697152. Throughput: 0: 41867.0. Samples: 805831300. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:33,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 18:18:37,174][00476] Signal inference workers to stop experience collection... (28600 times) [2024-03-29 18:18:37,209][00497] InferenceWorker_p0-w0: stopping experience collection (28600 times) [2024-03-29 18:18:37,398][00476] Signal inference workers to resume experience collection... (28600 times) [2024-03-29 18:18:37,399][00497] InferenceWorker_p0-w0: resuming experience collection (28600 times) [2024-03-29 18:18:37,657][00497] Updated weights for policy 0, policy_version 56388 (0.0020) [2024-03-29 18:18:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 923893760. Throughput: 0: 42438.7. Samples: 806096640. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:38,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:18:41,697][00497] Updated weights for policy 0, policy_version 56398 (0.0033) [2024-03-29 18:18:43,839][00126] Fps is (10 sec: 40960.8, 60 sec: 42325.5, 300 sec: 42209.6). Total num frames: 924106752. Throughput: 0: 42132.5. Samples: 806332300. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 18:18:43,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 18:18:45,873][00497] Updated weights for policy 0, policy_version 56408 (0.0025) [2024-03-29 18:18:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 924303360. Throughput: 0: 42163.7. Samples: 806462640. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:18:48,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 18:18:49,437][00497] Updated weights for policy 0, policy_version 56418 (0.0027) [2024-03-29 18:18:53,248][00497] Updated weights for policy 0, policy_version 56428 (0.0024) [2024-03-29 18:18:53,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 924549120. Throughput: 0: 42424.1. Samples: 806722280. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:18:53,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:18:57,111][00497] Updated weights for policy 0, policy_version 56438 (0.0030) [2024-03-29 18:18:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 924729344. Throughput: 0: 42049.7. Samples: 806958320. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:18:58,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:19:01,451][00497] Updated weights for policy 0, policy_version 56448 (0.0023) [2024-03-29 18:19:03,839][00126] Fps is (10 sec: 39320.7, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 924942336. Throughput: 0: 42220.7. Samples: 807096140. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:19:03,840][00126] Avg episode reward: [(0, '0.666')] [2024-03-29 18:19:04,049][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000056455_924958720.pth... [2024-03-29 18:19:04,389][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000055838_914849792.pth [2024-03-29 18:19:05,223][00497] Updated weights for policy 0, policy_version 56458 (0.0027) [2024-03-29 18:19:07,818][00476] Signal inference workers to stop experience collection... (28650 times) [2024-03-29 18:19:07,892][00497] InferenceWorker_p0-w0: stopping experience collection (28650 times) [2024-03-29 18:19:07,894][00476] Signal inference workers to resume experience collection... (28650 times) [2024-03-29 18:19:07,916][00497] InferenceWorker_p0-w0: resuming experience collection (28650 times) [2024-03-29 18:19:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 925155328. Throughput: 0: 42067.7. Samples: 807345960. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:19:08,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 18:19:08,926][00497] Updated weights for policy 0, policy_version 56468 (0.0025) [2024-03-29 18:19:12,884][00497] Updated weights for policy 0, policy_version 56478 (0.0021) [2024-03-29 18:19:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 925368320. Throughput: 0: 41390.6. Samples: 807570380. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:19:13,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:19:17,639][00497] Updated weights for policy 0, policy_version 56488 (0.0032) [2024-03-29 18:19:18,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 925548544. Throughput: 0: 41609.8. Samples: 807703740. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 18:19:18,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:19:21,389][00497] Updated weights for policy 0, policy_version 56498 (0.0032) [2024-03-29 18:19:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 925761536. Throughput: 0: 41306.6. Samples: 807955440. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:23,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 18:19:24,998][00497] Updated weights for policy 0, policy_version 56508 (0.0020) [2024-03-29 18:19:28,834][00497] Updated weights for policy 0, policy_version 56518 (0.0024) [2024-03-29 18:19:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 925990912. Throughput: 0: 41617.7. Samples: 808205100. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:28,840][00126] Avg episode reward: [(0, '0.619')] [2024-03-29 18:19:33,359][00497] Updated weights for policy 0, policy_version 56528 (0.0025) [2024-03-29 18:19:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 926171136. Throughput: 0: 41430.2. Samples: 808327000. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:33,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:19:37,080][00497] Updated weights for policy 0, policy_version 56538 (0.0025) [2024-03-29 18:19:38,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 926384128. Throughput: 0: 41282.5. Samples: 808580000. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:38,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:19:40,813][00497] Updated weights for policy 0, policy_version 56548 (0.0021) [2024-03-29 18:19:40,836][00476] Signal inference workers to stop experience collection... (28700 times) [2024-03-29 18:19:40,870][00497] InferenceWorker_p0-w0: stopping experience collection (28700 times) [2024-03-29 18:19:41,059][00476] Signal inference workers to resume experience collection... (28700 times) [2024-03-29 18:19:41,060][00497] InferenceWorker_p0-w0: resuming experience collection (28700 times) [2024-03-29 18:19:43,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 926597120. Throughput: 0: 41629.2. Samples: 808831640. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:43,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 18:19:44,578][00497] Updated weights for policy 0, policy_version 56558 (0.0027) [2024-03-29 18:19:48,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 926793728. Throughput: 0: 41165.6. Samples: 808948580. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:48,841][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:19:49,090][00497] Updated weights for policy 0, policy_version 56568 (0.0020) [2024-03-29 18:19:52,840][00497] Updated weights for policy 0, policy_version 56578 (0.0021) [2024-03-29 18:19:53,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41233.0, 300 sec: 42098.6). Total num frames: 927023104. Throughput: 0: 41397.7. Samples: 809208860. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 18:19:53,840][00126] Avg episode reward: [(0, '0.648')] [2024-03-29 18:19:56,307][00497] Updated weights for policy 0, policy_version 56588 (0.0031) [2024-03-29 18:19:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 927219712. Throughput: 0: 41844.1. Samples: 809453360. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:19:58,840][00126] Avg episode reward: [(0, '0.641')] [2024-03-29 18:20:00,223][00497] Updated weights for policy 0, policy_version 56598 (0.0019) [2024-03-29 18:20:03,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 927432704. Throughput: 0: 41491.0. Samples: 809570840. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:20:03,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 18:20:04,763][00497] Updated weights for policy 0, policy_version 56608 (0.0026) [2024-03-29 18:20:08,409][00497] Updated weights for policy 0, policy_version 56618 (0.0028) [2024-03-29 18:20:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 927645696. Throughput: 0: 41944.9. Samples: 809842960. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:20:08,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 18:20:11,481][00476] Signal inference workers to stop experience collection... (28750 times) [2024-03-29 18:20:11,504][00497] InferenceWorker_p0-w0: stopping experience collection (28750 times) [2024-03-29 18:20:11,680][00476] Signal inference workers to resume experience collection... (28750 times) [2024-03-29 18:20:11,681][00497] InferenceWorker_p0-w0: resuming experience collection (28750 times) [2024-03-29 18:20:11,685][00497] Updated weights for policy 0, policy_version 56628 (0.0027) [2024-03-29 18:20:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 927858688. Throughput: 0: 41877.3. Samples: 810089580. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:20:13,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 18:20:15,709][00497] Updated weights for policy 0, policy_version 56638 (0.0034) [2024-03-29 18:20:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 928071680. Throughput: 0: 41791.0. Samples: 810207600. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:20:18,842][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:20:20,195][00497] Updated weights for policy 0, policy_version 56648 (0.0019) [2024-03-29 18:20:23,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 928268288. Throughput: 0: 42205.3. Samples: 810479240. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:20:23,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 18:20:23,979][00497] Updated weights for policy 0, policy_version 56658 (0.0025) [2024-03-29 18:20:27,335][00497] Updated weights for policy 0, policy_version 56668 (0.0020) [2024-03-29 18:20:28,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 928481280. Throughput: 0: 41822.4. Samples: 810713640. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 18:20:28,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 18:20:31,311][00497] Updated weights for policy 0, policy_version 56678 (0.0018) [2024-03-29 18:20:33,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 928694272. Throughput: 0: 42168.9. Samples: 810846180. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:20:33,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:20:35,503][00497] Updated weights for policy 0, policy_version 56688 (0.0020) [2024-03-29 18:20:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 928890880. Throughput: 0: 42157.8. Samples: 811105960. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:20:38,840][00126] Avg episode reward: [(0, '0.484')] [2024-03-29 18:20:39,453][00497] Updated weights for policy 0, policy_version 56698 (0.0033) [2024-03-29 18:20:42,861][00497] Updated weights for policy 0, policy_version 56708 (0.0024) [2024-03-29 18:20:43,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 929136640. Throughput: 0: 42180.3. Samples: 811351480. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:20:43,840][00126] Avg episode reward: [(0, '0.580')] [2024-03-29 18:20:45,850][00476] Signal inference workers to stop experience collection... (28800 times) [2024-03-29 18:20:45,914][00497] InferenceWorker_p0-w0: stopping experience collection (28800 times) [2024-03-29 18:20:45,922][00476] Signal inference workers to resume experience collection... (28800 times) [2024-03-29 18:20:45,941][00497] InferenceWorker_p0-w0: resuming experience collection (28800 times) [2024-03-29 18:20:46,753][00497] Updated weights for policy 0, policy_version 56718 (0.0031) [2024-03-29 18:20:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 929333248. Throughput: 0: 42584.5. Samples: 811487140. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:20:48,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:20:50,895][00497] Updated weights for policy 0, policy_version 56728 (0.0019) [2024-03-29 18:20:53,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 929529856. Throughput: 0: 42092.0. Samples: 811737100. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:20:53,841][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:20:54,993][00497] Updated weights for policy 0, policy_version 56738 (0.0021) [2024-03-29 18:20:58,485][00497] Updated weights for policy 0, policy_version 56748 (0.0021) [2024-03-29 18:20:58,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 929775616. Throughput: 0: 42265.0. Samples: 811991500. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:20:58,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:21:02,489][00497] Updated weights for policy 0, policy_version 56758 (0.0025) [2024-03-29 18:21:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 929955840. Throughput: 0: 42412.0. Samples: 812116140. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 18:21:03,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 18:21:04,172][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000056762_929988608.pth... [2024-03-29 18:21:04,485][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000056147_919912448.pth [2024-03-29 18:21:06,717][00497] Updated weights for policy 0, policy_version 56768 (0.0026) [2024-03-29 18:21:08,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 930168832. Throughput: 0: 41903.5. Samples: 812364900. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:08,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 18:21:10,931][00497] Updated weights for policy 0, policy_version 56778 (0.0027) [2024-03-29 18:21:13,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 930381824. Throughput: 0: 42122.6. Samples: 812609160. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:13,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 18:21:14,198][00497] Updated weights for policy 0, policy_version 56788 (0.0035) [2024-03-29 18:21:18,285][00476] Signal inference workers to stop experience collection... (28850 times) [2024-03-29 18:21:18,327][00497] InferenceWorker_p0-w0: stopping experience collection (28850 times) [2024-03-29 18:21:18,362][00476] Signal inference workers to resume experience collection... (28850 times) [2024-03-29 18:21:18,365][00497] InferenceWorker_p0-w0: resuming experience collection (28850 times) [2024-03-29 18:21:18,368][00497] Updated weights for policy 0, policy_version 56798 (0.0028) [2024-03-29 18:21:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 930594816. Throughput: 0: 41985.6. Samples: 812735540. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:18,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:21:22,478][00497] Updated weights for policy 0, policy_version 56808 (0.0022) [2024-03-29 18:21:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 930791424. Throughput: 0: 41716.0. Samples: 812983180. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:23,840][00126] Avg episode reward: [(0, '0.645')] [2024-03-29 18:21:26,650][00497] Updated weights for policy 0, policy_version 56818 (0.0023) [2024-03-29 18:21:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 931004416. Throughput: 0: 41825.9. Samples: 813233640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:28,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 18:21:29,895][00497] Updated weights for policy 0, policy_version 56828 (0.0024) [2024-03-29 18:21:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 931217408. Throughput: 0: 41560.5. Samples: 813357360. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:33,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 18:21:34,041][00497] Updated weights for policy 0, policy_version 56838 (0.0024) [2024-03-29 18:21:38,253][00497] Updated weights for policy 0, policy_version 56848 (0.0022) [2024-03-29 18:21:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 931414016. Throughput: 0: 41408.9. Samples: 813600500. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:38,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:21:42,281][00497] Updated weights for policy 0, policy_version 56858 (0.0027) [2024-03-29 18:21:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 931627008. Throughput: 0: 41520.8. Samples: 813859940. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:43,840][00126] Avg episode reward: [(0, '0.505')] [2024-03-29 18:21:45,724][00497] Updated weights for policy 0, policy_version 56868 (0.0022) [2024-03-29 18:21:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 931823616. Throughput: 0: 41567.3. Samples: 813986660. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:48,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 18:21:49,668][00497] Updated weights for policy 0, policy_version 56878 (0.0020) [2024-03-29 18:21:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 932036608. Throughput: 0: 41449.4. Samples: 814230120. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:53,840][00126] Avg episode reward: [(0, '0.526')] [2024-03-29 18:21:54,145][00497] Updated weights for policy 0, policy_version 56888 (0.0020) [2024-03-29 18:21:58,002][00476] Signal inference workers to stop experience collection... (28900 times) [2024-03-29 18:21:58,003][00476] Signal inference workers to resume experience collection... (28900 times) [2024-03-29 18:21:58,017][00497] Updated weights for policy 0, policy_version 56898 (0.0020) [2024-03-29 18:21:58,037][00497] InferenceWorker_p0-w0: stopping experience collection (28900 times) [2024-03-29 18:21:58,038][00497] InferenceWorker_p0-w0: resuming experience collection (28900 times) [2024-03-29 18:21:58,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 932249600. Throughput: 0: 41740.8. Samples: 814487500. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:21:58,840][00126] Avg episode reward: [(0, '0.638')] [2024-03-29 18:22:01,275][00497] Updated weights for policy 0, policy_version 56908 (0.0021) [2024-03-29 18:22:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41765.4). Total num frames: 932446208. Throughput: 0: 41772.0. Samples: 814615280. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:22:03,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 18:22:05,400][00497] Updated weights for policy 0, policy_version 56918 (0.0024) [2024-03-29 18:22:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 932675584. Throughput: 0: 41742.7. Samples: 814861600. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:22:08,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 18:22:09,798][00497] Updated weights for policy 0, policy_version 56928 (0.0024) [2024-03-29 18:22:13,551][00497] Updated weights for policy 0, policy_version 56938 (0.0023) [2024-03-29 18:22:13,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 932872192. Throughput: 0: 41923.7. Samples: 815120200. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 18:22:13,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:22:17,013][00497] Updated weights for policy 0, policy_version 56948 (0.0023) [2024-03-29 18:22:18,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 933085184. Throughput: 0: 41810.2. Samples: 815238820. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:18,840][00126] Avg episode reward: [(0, '0.645')] [2024-03-29 18:22:21,135][00497] Updated weights for policy 0, policy_version 56958 (0.0024) [2024-03-29 18:22:23,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42052.1, 300 sec: 41876.4). Total num frames: 933314560. Throughput: 0: 41858.5. Samples: 815484140. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:23,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 18:22:25,779][00497] Updated weights for policy 0, policy_version 56968 (0.0025) [2024-03-29 18:22:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 933494784. Throughput: 0: 41787.7. Samples: 815740380. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:28,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:22:29,083][00476] Signal inference workers to stop experience collection... (28950 times) [2024-03-29 18:22:29,137][00497] InferenceWorker_p0-w0: stopping experience collection (28950 times) [2024-03-29 18:22:29,250][00476] Signal inference workers to resume experience collection... (28950 times) [2024-03-29 18:22:29,251][00497] InferenceWorker_p0-w0: resuming experience collection (28950 times) [2024-03-29 18:22:29,254][00497] Updated weights for policy 0, policy_version 56978 (0.0023) [2024-03-29 18:22:32,914][00497] Updated weights for policy 0, policy_version 56988 (0.0031) [2024-03-29 18:22:33,839][00126] Fps is (10 sec: 40960.9, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 933724160. Throughput: 0: 41797.7. Samples: 815867560. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:33,840][00126] Avg episode reward: [(0, '0.565')] [2024-03-29 18:22:36,742][00497] Updated weights for policy 0, policy_version 56998 (0.0022) [2024-03-29 18:22:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 933937152. Throughput: 0: 41866.4. Samples: 816114100. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:38,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 18:22:41,107][00497] Updated weights for policy 0, policy_version 57008 (0.0020) [2024-03-29 18:22:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 934133760. Throughput: 0: 41950.8. Samples: 816375280. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:43,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:22:44,754][00497] Updated weights for policy 0, policy_version 57018 (0.0024) [2024-03-29 18:22:48,477][00497] Updated weights for policy 0, policy_version 57028 (0.0025) [2024-03-29 18:22:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 934363136. Throughput: 0: 41870.8. Samples: 816499460. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:48,840][00126] Avg episode reward: [(0, '0.636')] [2024-03-29 18:22:52,349][00497] Updated weights for policy 0, policy_version 57038 (0.0021) [2024-03-29 18:22:53,839][00126] Fps is (10 sec: 42597.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 934559744. Throughput: 0: 41836.7. Samples: 816744260. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 18:22:53,841][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:22:56,665][00497] Updated weights for policy 0, policy_version 57048 (0.0022) [2024-03-29 18:22:58,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 934756352. Throughput: 0: 41919.4. Samples: 817006580. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:22:58,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:23:00,420][00497] Updated weights for policy 0, policy_version 57058 (0.0028) [2024-03-29 18:23:03,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 934985728. Throughput: 0: 41774.7. Samples: 817118680. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:23:03,841][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 18:23:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057067_934985728.pth... [2024-03-29 18:23:04,206][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000056455_924958720.pth [2024-03-29 18:23:04,476][00497] Updated weights for policy 0, policy_version 57068 (0.0019) [2024-03-29 18:23:04,500][00476] Signal inference workers to stop experience collection... (29000 times) [2024-03-29 18:23:04,535][00497] InferenceWorker_p0-w0: stopping experience collection (29000 times) [2024-03-29 18:23:04,725][00476] Signal inference workers to resume experience collection... (29000 times) [2024-03-29 18:23:04,726][00497] InferenceWorker_p0-w0: resuming experience collection (29000 times) [2024-03-29 18:23:08,232][00497] Updated weights for policy 0, policy_version 57078 (0.0028) [2024-03-29 18:23:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 935182336. Throughput: 0: 42024.1. Samples: 817375220. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:23:08,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:23:12,534][00497] Updated weights for policy 0, policy_version 57088 (0.0023) [2024-03-29 18:23:13,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 935378944. Throughput: 0: 41848.3. Samples: 817623560. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:23:13,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 18:23:16,131][00497] Updated weights for policy 0, policy_version 57098 (0.0029) [2024-03-29 18:23:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 935591936. Throughput: 0: 41365.7. Samples: 817729020. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:23:18,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:23:20,225][00497] Updated weights for policy 0, policy_version 57108 (0.0026) [2024-03-29 18:23:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 935804928. Throughput: 0: 41672.3. Samples: 817989360. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:23:23,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:23:24,125][00497] Updated weights for policy 0, policy_version 57118 (0.0027) [2024-03-29 18:23:28,103][00497] Updated weights for policy 0, policy_version 57128 (0.0029) [2024-03-29 18:23:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 936001536. Throughput: 0: 41707.0. Samples: 818252100. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 18:23:28,840][00126] Avg episode reward: [(0, '0.654')] [2024-03-29 18:23:31,741][00497] Updated weights for policy 0, policy_version 57138 (0.0019) [2024-03-29 18:23:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 936230912. Throughput: 0: 41527.9. Samples: 818368220. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:23:33,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 18:23:33,895][00476] Signal inference workers to stop experience collection... (29050 times) [2024-03-29 18:23:33,928][00497] InferenceWorker_p0-w0: stopping experience collection (29050 times) [2024-03-29 18:23:34,111][00476] Signal inference workers to resume experience collection... (29050 times) [2024-03-29 18:23:34,111][00497] InferenceWorker_p0-w0: resuming experience collection (29050 times) [2024-03-29 18:23:35,833][00497] Updated weights for policy 0, policy_version 57148 (0.0020) [2024-03-29 18:23:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 936427520. Throughput: 0: 41701.5. Samples: 818620820. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:23:38,841][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 18:23:39,667][00497] Updated weights for policy 0, policy_version 57158 (0.0023) [2024-03-29 18:23:43,713][00497] Updated weights for policy 0, policy_version 57168 (0.0024) [2024-03-29 18:23:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 936640512. Throughput: 0: 41482.7. Samples: 818873300. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:23:43,840][00126] Avg episode reward: [(0, '0.501')] [2024-03-29 18:23:47,287][00497] Updated weights for policy 0, policy_version 57178 (0.0025) [2024-03-29 18:23:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 936869888. Throughput: 0: 41877.6. Samples: 819003180. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:23:48,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 18:23:51,259][00497] Updated weights for policy 0, policy_version 57188 (0.0025) [2024-03-29 18:23:53,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 937066496. Throughput: 0: 41884.1. Samples: 819260000. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:23:53,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 18:23:55,136][00497] Updated weights for policy 0, policy_version 57198 (0.0021) [2024-03-29 18:23:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 937279488. Throughput: 0: 41800.9. Samples: 819504600. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:23:58,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:23:59,135][00497] Updated weights for policy 0, policy_version 57208 (0.0023) [2024-03-29 18:24:02,968][00497] Updated weights for policy 0, policy_version 57218 (0.0027) [2024-03-29 18:24:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 937492480. Throughput: 0: 42381.8. Samples: 819636200. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:24:03,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:24:06,841][00497] Updated weights for policy 0, policy_version 57228 (0.0024) [2024-03-29 18:24:08,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 937689088. Throughput: 0: 42005.9. Samples: 819879620. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:08,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 18:24:10,705][00476] Signal inference workers to stop experience collection... (29100 times) [2024-03-29 18:24:10,780][00497] InferenceWorker_p0-w0: stopping experience collection (29100 times) [2024-03-29 18:24:10,876][00476] Signal inference workers to resume experience collection... (29100 times) [2024-03-29 18:24:10,876][00497] InferenceWorker_p0-w0: resuming experience collection (29100 times) [2024-03-29 18:24:10,879][00497] Updated weights for policy 0, policy_version 57238 (0.0020) [2024-03-29 18:24:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 937918464. Throughput: 0: 41803.1. Samples: 820133240. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:13,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 18:24:14,618][00497] Updated weights for policy 0, policy_version 57248 (0.0017) [2024-03-29 18:24:18,409][00497] Updated weights for policy 0, policy_version 57258 (0.0027) [2024-03-29 18:24:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 938115072. Throughput: 0: 42253.9. Samples: 820269640. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:18,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:24:22,271][00497] Updated weights for policy 0, policy_version 57268 (0.0019) [2024-03-29 18:24:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 938311680. Throughput: 0: 42133.7. Samples: 820516840. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:23,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:24:26,207][00497] Updated weights for policy 0, policy_version 57278 (0.0019) [2024-03-29 18:24:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 938557440. Throughput: 0: 42102.3. Samples: 820767900. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:28,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 18:24:30,003][00497] Updated weights for policy 0, policy_version 57288 (0.0021) [2024-03-29 18:24:33,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 938754048. Throughput: 0: 42433.5. Samples: 820912680. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:33,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:24:34,005][00497] Updated weights for policy 0, policy_version 57298 (0.0029) [2024-03-29 18:24:37,942][00497] Updated weights for policy 0, policy_version 57308 (0.0025) [2024-03-29 18:24:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 938967040. Throughput: 0: 41874.6. Samples: 821144360. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 18:24:38,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 18:24:41,708][00497] Updated weights for policy 0, policy_version 57318 (0.0027) [2024-03-29 18:24:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 939196416. Throughput: 0: 42169.0. Samples: 821402200. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:24:43,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:24:45,553][00497] Updated weights for policy 0, policy_version 57328 (0.0019) [2024-03-29 18:24:46,300][00476] Signal inference workers to stop experience collection... (29150 times) [2024-03-29 18:24:46,340][00497] InferenceWorker_p0-w0: stopping experience collection (29150 times) [2024-03-29 18:24:46,376][00476] Signal inference workers to resume experience collection... (29150 times) [2024-03-29 18:24:46,378][00497] InferenceWorker_p0-w0: resuming experience collection (29150 times) [2024-03-29 18:24:48,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 939393024. Throughput: 0: 42398.8. Samples: 821544140. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:24:48,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 18:24:49,230][00497] Updated weights for policy 0, policy_version 57338 (0.0023) [2024-03-29 18:24:53,384][00497] Updated weights for policy 0, policy_version 57348 (0.0018) [2024-03-29 18:24:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 939606016. Throughput: 0: 42506.7. Samples: 821792420. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:24:53,840][00126] Avg episode reward: [(0, '0.625')] [2024-03-29 18:24:57,032][00497] Updated weights for policy 0, policy_version 57358 (0.0024) [2024-03-29 18:24:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.5, 300 sec: 41987.5). Total num frames: 939819008. Throughput: 0: 42469.0. Samples: 822044340. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:24:58,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:25:00,856][00497] Updated weights for policy 0, policy_version 57368 (0.0023) [2024-03-29 18:25:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 940032000. Throughput: 0: 42358.7. Samples: 822175780. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:25:03,840][00126] Avg episode reward: [(0, '0.618')] [2024-03-29 18:25:04,087][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057376_940048384.pth... [2024-03-29 18:25:04,416][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000056762_929988608.pth [2024-03-29 18:25:04,972][00497] Updated weights for policy 0, policy_version 57378 (0.0019) [2024-03-29 18:25:08,743][00497] Updated weights for policy 0, policy_version 57388 (0.0023) [2024-03-29 18:25:08,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 940244992. Throughput: 0: 42422.3. Samples: 822425840. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:25:08,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 18:25:12,525][00497] Updated weights for policy 0, policy_version 57398 (0.0025) [2024-03-29 18:25:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 940457984. Throughput: 0: 42572.1. Samples: 822683640. Policy #0 lag: (min: 2.0, avg: 21.3, max: 43.0) [2024-03-29 18:25:13,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:25:15,552][00476] Signal inference workers to stop experience collection... (29200 times) [2024-03-29 18:25:15,624][00497] InferenceWorker_p0-w0: stopping experience collection (29200 times) [2024-03-29 18:25:15,634][00476] Signal inference workers to resume experience collection... (29200 times) [2024-03-29 18:25:15,654][00497] InferenceWorker_p0-w0: resuming experience collection (29200 times) [2024-03-29 18:25:16,202][00497] Updated weights for policy 0, policy_version 57408 (0.0028) [2024-03-29 18:25:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 940654592. Throughput: 0: 42070.1. Samples: 822805840. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:18,842][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 18:25:20,273][00497] Updated weights for policy 0, policy_version 57418 (0.0019) [2024-03-29 18:25:23,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 940883968. Throughput: 0: 42558.1. Samples: 823059480. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:23,840][00126] Avg episode reward: [(0, '0.502')] [2024-03-29 18:25:24,196][00497] Updated weights for policy 0, policy_version 57428 (0.0024) [2024-03-29 18:25:28,220][00497] Updated weights for policy 0, policy_version 57438 (0.0021) [2024-03-29 18:25:28,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42325.5, 300 sec: 42043.0). Total num frames: 941096960. Throughput: 0: 42465.9. Samples: 823313160. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:28,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 18:25:31,840][00497] Updated weights for policy 0, policy_version 57448 (0.0018) [2024-03-29 18:25:33,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 941277184. Throughput: 0: 42124.4. Samples: 823439740. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:33,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 18:25:36,039][00497] Updated weights for policy 0, policy_version 57458 (0.0023) [2024-03-29 18:25:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 941522944. Throughput: 0: 42209.8. Samples: 823691860. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:38,840][00126] Avg episode reward: [(0, '0.580')] [2024-03-29 18:25:39,972][00497] Updated weights for policy 0, policy_version 57468 (0.0025) [2024-03-29 18:25:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 941703168. Throughput: 0: 42084.0. Samples: 823938120. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:43,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:25:43,908][00497] Updated weights for policy 0, policy_version 57478 (0.0017) [2024-03-29 18:25:47,647][00497] Updated weights for policy 0, policy_version 57488 (0.0024) [2024-03-29 18:25:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 941916160. Throughput: 0: 41876.9. Samples: 824060240. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 18:25:48,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 18:25:50,667][00476] Signal inference workers to stop experience collection... (29250 times) [2024-03-29 18:25:50,670][00476] Signal inference workers to resume experience collection... (29250 times) [2024-03-29 18:25:50,716][00497] InferenceWorker_p0-w0: stopping experience collection (29250 times) [2024-03-29 18:25:50,716][00497] InferenceWorker_p0-w0: resuming experience collection (29250 times) [2024-03-29 18:25:51,574][00497] Updated weights for policy 0, policy_version 57498 (0.0027) [2024-03-29 18:25:53,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42325.2, 300 sec: 41931.9). Total num frames: 942145536. Throughput: 0: 42257.3. Samples: 824327420. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:25:53,842][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 18:25:55,470][00497] Updated weights for policy 0, policy_version 57508 (0.0018) [2024-03-29 18:25:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 942342144. Throughput: 0: 42252.0. Samples: 824584980. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:25:58,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:25:59,254][00497] Updated weights for policy 0, policy_version 57518 (0.0023) [2024-03-29 18:26:03,147][00497] Updated weights for policy 0, policy_version 57528 (0.0023) [2024-03-29 18:26:03,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 942538752. Throughput: 0: 42037.8. Samples: 824697540. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:26:03,840][00126] Avg episode reward: [(0, '0.639')] [2024-03-29 18:26:07,205][00497] Updated weights for policy 0, policy_version 57538 (0.0020) [2024-03-29 18:26:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 942768128. Throughput: 0: 42154.8. Samples: 824956440. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:26:08,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:26:10,971][00497] Updated weights for policy 0, policy_version 57548 (0.0028) [2024-03-29 18:26:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 942964736. Throughput: 0: 42000.4. Samples: 825203180. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:26:13,840][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 18:26:14,970][00497] Updated weights for policy 0, policy_version 57558 (0.0026) [2024-03-29 18:26:16,829][00476] Signal inference workers to stop experience collection... (29300 times) [2024-03-29 18:26:16,870][00497] InferenceWorker_p0-w0: stopping experience collection (29300 times) [2024-03-29 18:26:16,910][00476] Signal inference workers to resume experience collection... (29300 times) [2024-03-29 18:26:16,912][00497] InferenceWorker_p0-w0: resuming experience collection (29300 times) [2024-03-29 18:26:18,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 943177728. Throughput: 0: 41980.8. Samples: 825328880. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:26:18,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:26:19,065][00497] Updated weights for policy 0, policy_version 57568 (0.0026) [2024-03-29 18:26:22,895][00497] Updated weights for policy 0, policy_version 57578 (0.0030) [2024-03-29 18:26:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 943390720. Throughput: 0: 42040.4. Samples: 825583680. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:26:23,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:26:26,353][00497] Updated weights for policy 0, policy_version 57588 (0.0022) [2024-03-29 18:26:28,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 943603712. Throughput: 0: 42258.2. Samples: 825839740. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 18:26:28,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 18:26:30,291][00497] Updated weights for policy 0, policy_version 57598 (0.0023) [2024-03-29 18:26:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 943816704. Throughput: 0: 42126.3. Samples: 825955920. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:26:33,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:26:34,521][00497] Updated weights for policy 0, policy_version 57608 (0.0019) [2024-03-29 18:26:38,571][00497] Updated weights for policy 0, policy_version 57618 (0.0024) [2024-03-29 18:26:38,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 944013312. Throughput: 0: 41699.3. Samples: 826203880. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:26:38,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 18:26:42,125][00497] Updated weights for policy 0, policy_version 57628 (0.0022) [2024-03-29 18:26:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 944226304. Throughput: 0: 41859.5. Samples: 826468660. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:26:43,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:26:46,093][00497] Updated weights for policy 0, policy_version 57638 (0.0021) [2024-03-29 18:26:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 944439296. Throughput: 0: 41908.5. Samples: 826583420. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:26:48,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 18:26:50,589][00497] Updated weights for policy 0, policy_version 57648 (0.0028) [2024-03-29 18:26:51,629][00476] Signal inference workers to stop experience collection... (29350 times) [2024-03-29 18:26:51,670][00497] InferenceWorker_p0-w0: stopping experience collection (29350 times) [2024-03-29 18:26:51,864][00476] Signal inference workers to resume experience collection... (29350 times) [2024-03-29 18:26:51,865][00497] InferenceWorker_p0-w0: resuming experience collection (29350 times) [2024-03-29 18:26:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 944635904. Throughput: 0: 41804.4. Samples: 826837640. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:26:53,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 18:26:54,252][00497] Updated weights for policy 0, policy_version 57658 (0.0030) [2024-03-29 18:26:58,162][00497] Updated weights for policy 0, policy_version 57668 (0.0024) [2024-03-29 18:26:58,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 944848896. Throughput: 0: 41767.6. Samples: 827082720. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:26:58,840][00126] Avg episode reward: [(0, '0.644')] [2024-03-29 18:27:01,897][00497] Updated weights for policy 0, policy_version 57678 (0.0019) [2024-03-29 18:27:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41987.4). Total num frames: 945061888. Throughput: 0: 41874.7. Samples: 827213240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:27:03,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 18:27:04,058][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057683_945078272.pth... [2024-03-29 18:27:04,388][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057067_934985728.pth [2024-03-29 18:27:06,664][00497] Updated weights for policy 0, policy_version 57688 (0.0020) [2024-03-29 18:27:08,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 945274880. Throughput: 0: 41858.1. Samples: 827467300. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:08,841][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 18:27:10,381][00497] Updated weights for policy 0, policy_version 57698 (0.0035) [2024-03-29 18:27:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 945471488. Throughput: 0: 41340.0. Samples: 827700040. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:13,840][00126] Avg episode reward: [(0, '0.653')] [2024-03-29 18:27:13,895][00497] Updated weights for policy 0, policy_version 57708 (0.0029) [2024-03-29 18:27:17,690][00497] Updated weights for policy 0, policy_version 57718 (0.0024) [2024-03-29 18:27:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 945684480. Throughput: 0: 41821.7. Samples: 827837900. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:18,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 18:27:22,332][00497] Updated weights for policy 0, policy_version 57728 (0.0020) [2024-03-29 18:27:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 945897472. Throughput: 0: 41800.3. Samples: 828084900. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:23,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 18:27:26,022][00497] Updated weights for policy 0, policy_version 57738 (0.0022) [2024-03-29 18:27:26,689][00476] Signal inference workers to stop experience collection... (29400 times) [2024-03-29 18:27:26,766][00497] InferenceWorker_p0-w0: stopping experience collection (29400 times) [2024-03-29 18:27:26,776][00476] Signal inference workers to resume experience collection... (29400 times) [2024-03-29 18:27:26,795][00497] InferenceWorker_p0-w0: resuming experience collection (29400 times) [2024-03-29 18:27:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 946094080. Throughput: 0: 41049.3. Samples: 828315880. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:28,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 18:27:29,735][00497] Updated weights for policy 0, policy_version 57748 (0.0027) [2024-03-29 18:27:33,425][00497] Updated weights for policy 0, policy_version 57758 (0.0024) [2024-03-29 18:27:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 946323456. Throughput: 0: 41692.4. Samples: 828459580. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:33,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:27:37,767][00497] Updated weights for policy 0, policy_version 57768 (0.0030) [2024-03-29 18:27:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 946503680. Throughput: 0: 41520.0. Samples: 828706040. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 18:27:38,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 18:27:41,596][00497] Updated weights for policy 0, policy_version 57778 (0.0028) [2024-03-29 18:27:43,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 946733056. Throughput: 0: 41667.5. Samples: 828957760. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:27:43,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 18:27:45,680][00497] Updated weights for policy 0, policy_version 57788 (0.0023) [2024-03-29 18:27:48,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 946929664. Throughput: 0: 41560.9. Samples: 829083480. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:27:48,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 18:27:49,224][00497] Updated weights for policy 0, policy_version 57798 (0.0025) [2024-03-29 18:27:53,502][00497] Updated weights for policy 0, policy_version 57808 (0.0022) [2024-03-29 18:27:53,839][00126] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 947126272. Throughput: 0: 41431.1. Samples: 829331700. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:27:53,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 18:27:57,125][00497] Updated weights for policy 0, policy_version 57818 (0.0029) [2024-03-29 18:27:57,922][00476] Signal inference workers to stop experience collection... (29450 times) [2024-03-29 18:27:57,959][00497] InferenceWorker_p0-w0: stopping experience collection (29450 times) [2024-03-29 18:27:58,146][00476] Signal inference workers to resume experience collection... (29450 times) [2024-03-29 18:27:58,147][00497] InferenceWorker_p0-w0: resuming experience collection (29450 times) [2024-03-29 18:27:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 947372032. Throughput: 0: 42172.9. Samples: 829597820. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:27:58,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 18:28:00,852][00497] Updated weights for policy 0, policy_version 57828 (0.0024) [2024-03-29 18:28:03,839][00126] Fps is (10 sec: 44237.9, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 947568640. Throughput: 0: 41709.9. Samples: 829714840. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:28:03,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 18:28:04,662][00497] Updated weights for policy 0, policy_version 57838 (0.0027) [2024-03-29 18:28:08,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 947765248. Throughput: 0: 41809.0. Samples: 829966300. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:28:08,841][00126] Avg episode reward: [(0, '0.658')] [2024-03-29 18:28:09,055][00497] Updated weights for policy 0, policy_version 57848 (0.0023) [2024-03-29 18:28:12,612][00497] Updated weights for policy 0, policy_version 57858 (0.0027) [2024-03-29 18:28:13,839][00126] Fps is (10 sec: 44235.9, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 948011008. Throughput: 0: 42658.6. Samples: 830235520. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 18:28:13,840][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 18:28:16,273][00497] Updated weights for policy 0, policy_version 57868 (0.0020) [2024-03-29 18:28:18,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 948191232. Throughput: 0: 42034.7. Samples: 830351140. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:18,840][00126] Avg episode reward: [(0, '0.640')] [2024-03-29 18:28:20,205][00497] Updated weights for policy 0, policy_version 57878 (0.0025) [2024-03-29 18:28:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 948420608. Throughput: 0: 42125.7. Samples: 830601700. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:23,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:28:24,640][00497] Updated weights for policy 0, policy_version 57888 (0.0022) [2024-03-29 18:28:28,392][00497] Updated weights for policy 0, policy_version 57898 (0.0029) [2024-03-29 18:28:28,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 948617216. Throughput: 0: 42136.8. Samples: 830853920. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:28,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:28:29,871][00476] Signal inference workers to stop experience collection... (29500 times) [2024-03-29 18:28:29,953][00476] Signal inference workers to resume experience collection... (29500 times) [2024-03-29 18:28:29,954][00497] InferenceWorker_p0-w0: stopping experience collection (29500 times) [2024-03-29 18:28:29,985][00497] InferenceWorker_p0-w0: resuming experience collection (29500 times) [2024-03-29 18:28:32,169][00497] Updated weights for policy 0, policy_version 57908 (0.0022) [2024-03-29 18:28:33,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 948813824. Throughput: 0: 41844.9. Samples: 830966500. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:33,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 18:28:36,023][00497] Updated weights for policy 0, policy_version 57918 (0.0019) [2024-03-29 18:28:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 949059584. Throughput: 0: 42288.1. Samples: 831234660. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:38,842][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 18:28:40,221][00497] Updated weights for policy 0, policy_version 57928 (0.0033) [2024-03-29 18:28:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 949239808. Throughput: 0: 41966.2. Samples: 831486300. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:43,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 18:28:44,024][00497] Updated weights for policy 0, policy_version 57938 (0.0022) [2024-03-29 18:28:47,868][00497] Updated weights for policy 0, policy_version 57948 (0.0026) [2024-03-29 18:28:48,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 949420032. Throughput: 0: 41768.8. Samples: 831594440. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 18:28:48,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 18:28:51,857][00497] Updated weights for policy 0, policy_version 57958 (0.0019) [2024-03-29 18:28:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.5, 300 sec: 41987.5). Total num frames: 949665792. Throughput: 0: 41956.0. Samples: 831854320. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:28:53,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:28:56,080][00497] Updated weights for policy 0, policy_version 57968 (0.0032) [2024-03-29 18:28:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 949846016. Throughput: 0: 41592.9. Samples: 832107200. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:28:58,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:28:59,784][00497] Updated weights for policy 0, policy_version 57978 (0.0019) [2024-03-29 18:29:00,280][00476] Signal inference workers to stop experience collection... (29550 times) [2024-03-29 18:29:00,312][00497] InferenceWorker_p0-w0: stopping experience collection (29550 times) [2024-03-29 18:29:00,462][00476] Signal inference workers to resume experience collection... (29550 times) [2024-03-29 18:29:00,462][00497] InferenceWorker_p0-w0: resuming experience collection (29550 times) [2024-03-29 18:29:03,433][00497] Updated weights for policy 0, policy_version 57988 (0.0024) [2024-03-29 18:29:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 950075392. Throughput: 0: 41605.0. Samples: 832223360. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:29:03,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 18:29:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057988_950075392.pth... [2024-03-29 18:29:04,181][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057376_940048384.pth [2024-03-29 18:29:07,484][00497] Updated weights for policy 0, policy_version 57998 (0.0022) [2024-03-29 18:29:08,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 950304768. Throughput: 0: 41757.9. Samples: 832480800. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:29:08,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 18:29:11,771][00497] Updated weights for policy 0, policy_version 58008 (0.0022) [2024-03-29 18:29:13,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 950468608. Throughput: 0: 41728.0. Samples: 832731680. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:29:13,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 18:29:15,613][00497] Updated weights for policy 0, policy_version 58018 (0.0023) [2024-03-29 18:29:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 950714368. Throughput: 0: 41807.7. Samples: 832847840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:29:18,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 18:29:19,625][00497] Updated weights for policy 0, policy_version 58028 (0.0026) [2024-03-29 18:29:23,592][00497] Updated weights for policy 0, policy_version 58038 (0.0023) [2024-03-29 18:29:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 950894592. Throughput: 0: 41249.0. Samples: 833090860. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:29:23,840][00126] Avg episode reward: [(0, '0.568')] [2024-03-29 18:29:27,689][00497] Updated weights for policy 0, policy_version 58048 (0.0020) [2024-03-29 18:29:28,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 951091200. Throughput: 0: 41288.0. Samples: 833344260. Policy #0 lag: (min: 0.0, avg: 21.7, max: 43.0) [2024-03-29 18:29:28,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:29:31,560][00497] Updated weights for policy 0, policy_version 58058 (0.0027) [2024-03-29 18:29:32,306][00476] Signal inference workers to stop experience collection... (29600 times) [2024-03-29 18:29:32,306][00476] Signal inference workers to resume experience collection... (29600 times) [2024-03-29 18:29:32,349][00497] InferenceWorker_p0-w0: stopping experience collection (29600 times) [2024-03-29 18:29:32,349][00497] InferenceWorker_p0-w0: resuming experience collection (29600 times) [2024-03-29 18:29:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 951320576. Throughput: 0: 41534.2. Samples: 833463480. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:29:33,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 18:29:35,435][00497] Updated weights for policy 0, policy_version 58068 (0.0026) [2024-03-29 18:29:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 951517184. Throughput: 0: 41282.6. Samples: 833712040. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:29:38,840][00126] Avg episode reward: [(0, '0.647')] [2024-03-29 18:29:39,588][00497] Updated weights for policy 0, policy_version 58078 (0.0025) [2024-03-29 18:29:43,478][00497] Updated weights for policy 0, policy_version 58088 (0.0021) [2024-03-29 18:29:43,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 951713792. Throughput: 0: 41003.6. Samples: 833952360. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:29:43,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:29:47,291][00497] Updated weights for policy 0, policy_version 58098 (0.0022) [2024-03-29 18:29:48,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 951943168. Throughput: 0: 41563.6. Samples: 834093720. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:29:48,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 18:29:51,398][00497] Updated weights for policy 0, policy_version 58108 (0.0023) [2024-03-29 18:29:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 952139776. Throughput: 0: 41465.8. Samples: 834346760. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:29:53,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 18:29:55,457][00497] Updated weights for policy 0, policy_version 58118 (0.0032) [2024-03-29 18:29:58,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 952352768. Throughput: 0: 40771.0. Samples: 834566380. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:29:58,840][00126] Avg episode reward: [(0, '0.488')] [2024-03-29 18:29:59,468][00497] Updated weights for policy 0, policy_version 58128 (0.0021) [2024-03-29 18:30:03,334][00497] Updated weights for policy 0, policy_version 58138 (0.0027) [2024-03-29 18:30:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 952549376. Throughput: 0: 41403.0. Samples: 834710980. Policy #0 lag: (min: 0.0, avg: 19.8, max: 43.0) [2024-03-29 18:30:03,841][00126] Avg episode reward: [(0, '0.509')] [2024-03-29 18:30:07,082][00497] Updated weights for policy 0, policy_version 58148 (0.0017) [2024-03-29 18:30:08,105][00476] Signal inference workers to stop experience collection... (29650 times) [2024-03-29 18:30:08,147][00497] InferenceWorker_p0-w0: stopping experience collection (29650 times) [2024-03-29 18:30:08,264][00476] Signal inference workers to resume experience collection... (29650 times) [2024-03-29 18:30:08,264][00497] InferenceWorker_p0-w0: resuming experience collection (29650 times) [2024-03-29 18:30:08,839][00126] Fps is (10 sec: 40960.7, 60 sec: 40960.1, 300 sec: 41709.8). Total num frames: 952762368. Throughput: 0: 41488.9. Samples: 834957860. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:08,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 18:30:11,150][00497] Updated weights for policy 0, policy_version 58158 (0.0022) [2024-03-29 18:30:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.1, 300 sec: 41820.8). Total num frames: 952991744. Throughput: 0: 41171.4. Samples: 835196980. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:13,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 18:30:14,951][00497] Updated weights for policy 0, policy_version 58168 (0.0030) [2024-03-29 18:30:18,809][00497] Updated weights for policy 0, policy_version 58178 (0.0023) [2024-03-29 18:30:18,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 953188352. Throughput: 0: 41637.7. Samples: 835337180. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:18,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 18:30:22,667][00497] Updated weights for policy 0, policy_version 58188 (0.0023) [2024-03-29 18:30:23,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 953368576. Throughput: 0: 41381.8. Samples: 835574220. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:23,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 18:30:26,772][00497] Updated weights for policy 0, policy_version 58198 (0.0025) [2024-03-29 18:30:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 953614336. Throughput: 0: 41627.1. Samples: 835825580. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:28,840][00126] Avg episode reward: [(0, '0.648')] [2024-03-29 18:30:30,574][00497] Updated weights for policy 0, policy_version 58208 (0.0024) [2024-03-29 18:30:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 953794560. Throughput: 0: 41506.2. Samples: 835961500. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:33,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 18:30:34,715][00497] Updated weights for policy 0, policy_version 58218 (0.0022) [2024-03-29 18:30:38,627][00497] Updated weights for policy 0, policy_version 58228 (0.0031) [2024-03-29 18:30:38,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 954007552. Throughput: 0: 40995.0. Samples: 836191540. Policy #0 lag: (min: 0.0, avg: 22.6, max: 45.0) [2024-03-29 18:30:38,841][00126] Avg episode reward: [(0, '0.635')] [2024-03-29 18:30:41,366][00476] Signal inference workers to stop experience collection... (29700 times) [2024-03-29 18:30:41,381][00497] InferenceWorker_p0-w0: stopping experience collection (29700 times) [2024-03-29 18:30:41,580][00476] Signal inference workers to resume experience collection... (29700 times) [2024-03-29 18:30:41,581][00497] InferenceWorker_p0-w0: resuming experience collection (29700 times) [2024-03-29 18:30:42,922][00497] Updated weights for policy 0, policy_version 58238 (0.0020) [2024-03-29 18:30:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 954204160. Throughput: 0: 41786.4. Samples: 836446760. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:30:43,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:30:46,587][00497] Updated weights for policy 0, policy_version 58248 (0.0032) [2024-03-29 18:30:48,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 954417152. Throughput: 0: 41308.6. Samples: 836569860. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:30:48,841][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 18:30:50,643][00497] Updated weights for policy 0, policy_version 58258 (0.0024) [2024-03-29 18:30:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 954630144. Throughput: 0: 41192.3. Samples: 836811520. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:30:53,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 18:30:54,742][00497] Updated weights for policy 0, policy_version 58268 (0.0029) [2024-03-29 18:30:58,375][00497] Updated weights for policy 0, policy_version 58278 (0.0022) [2024-03-29 18:30:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 41709.8). Total num frames: 954843136. Throughput: 0: 41704.2. Samples: 837073660. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:30:58,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:31:02,357][00497] Updated weights for policy 0, policy_version 58288 (0.0025) [2024-03-29 18:31:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 955056128. Throughput: 0: 41256.0. Samples: 837193700. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:31:03,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:31:03,859][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000058292_955056128.pth... [2024-03-29 18:31:04,170][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057683_945078272.pth [2024-03-29 18:31:06,228][00497] Updated weights for policy 0, policy_version 58298 (0.0021) [2024-03-29 18:31:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 955269120. Throughput: 0: 41729.8. Samples: 837452060. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:31:08,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 18:31:10,339][00497] Updated weights for policy 0, policy_version 58308 (0.0022) [2024-03-29 18:31:11,829][00476] Signal inference workers to stop experience collection... (29750 times) [2024-03-29 18:31:11,870][00497] InferenceWorker_p0-w0: stopping experience collection (29750 times) [2024-03-29 18:31:12,054][00476] Signal inference workers to resume experience collection... (29750 times) [2024-03-29 18:31:12,055][00497] InferenceWorker_p0-w0: resuming experience collection (29750 times) [2024-03-29 18:31:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 955465728. Throughput: 0: 41579.6. Samples: 837696660. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 18:31:13,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:31:14,044][00497] Updated weights for policy 0, policy_version 58318 (0.0026) [2024-03-29 18:31:18,082][00497] Updated weights for policy 0, policy_version 58328 (0.0021) [2024-03-29 18:31:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 955678720. Throughput: 0: 41235.1. Samples: 837817080. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:18,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:31:21,976][00497] Updated weights for policy 0, policy_version 58338 (0.0030) [2024-03-29 18:31:23,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 955908096. Throughput: 0: 41986.3. Samples: 838080920. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:23,841][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:31:26,009][00497] Updated weights for policy 0, policy_version 58348 (0.0021) [2024-03-29 18:31:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.2, 300 sec: 41598.7). Total num frames: 956088320. Throughput: 0: 41736.4. Samples: 838324900. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:28,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:31:29,641][00497] Updated weights for policy 0, policy_version 58358 (0.0019) [2024-03-29 18:31:33,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 956284928. Throughput: 0: 41625.3. Samples: 838443000. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:33,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 18:31:34,038][00497] Updated weights for policy 0, policy_version 58368 (0.0026) [2024-03-29 18:31:37,858][00497] Updated weights for policy 0, policy_version 58378 (0.0020) [2024-03-29 18:31:38,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 956514304. Throughput: 0: 42078.6. Samples: 838705060. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:38,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 18:31:41,927][00497] Updated weights for policy 0, policy_version 58388 (0.0025) [2024-03-29 18:31:43,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.0, 300 sec: 41543.1). Total num frames: 956694528. Throughput: 0: 41702.1. Samples: 838950260. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:43,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:31:44,004][00476] Signal inference workers to stop experience collection... (29800 times) [2024-03-29 18:31:44,034][00497] InferenceWorker_p0-w0: stopping experience collection (29800 times) [2024-03-29 18:31:44,190][00476] Signal inference workers to resume experience collection... (29800 times) [2024-03-29 18:31:44,191][00497] InferenceWorker_p0-w0: resuming experience collection (29800 times) [2024-03-29 18:31:45,546][00497] Updated weights for policy 0, policy_version 58398 (0.0030) [2024-03-29 18:31:48,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 956923904. Throughput: 0: 41664.1. Samples: 839068580. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:48,840][00126] Avg episode reward: [(0, '0.625')] [2024-03-29 18:31:49,742][00497] Updated weights for policy 0, policy_version 58408 (0.0030) [2024-03-29 18:31:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41543.1). Total num frames: 957104128. Throughput: 0: 41347.0. Samples: 839312680. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 18:31:53,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:31:53,868][00497] Updated weights for policy 0, policy_version 58418 (0.0027) [2024-03-29 18:31:57,889][00497] Updated weights for policy 0, policy_version 58428 (0.0026) [2024-03-29 18:31:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.0, 300 sec: 41543.2). Total num frames: 957317120. Throughput: 0: 41609.9. Samples: 839569100. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:31:58,841][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 18:32:01,548][00497] Updated weights for policy 0, policy_version 58438 (0.0023) [2024-03-29 18:32:03,839][00126] Fps is (10 sec: 45876.0, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 957562880. Throughput: 0: 41480.5. Samples: 839683700. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:32:03,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 18:32:05,438][00497] Updated weights for policy 0, policy_version 58448 (0.0025) [2024-03-29 18:32:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 957726720. Throughput: 0: 41028.5. Samples: 839927200. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:32:08,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 18:32:09,696][00497] Updated weights for policy 0, policy_version 58458 (0.0026) [2024-03-29 18:32:13,611][00497] Updated weights for policy 0, policy_version 58468 (0.0023) [2024-03-29 18:32:13,839][00126] Fps is (10 sec: 37683.0, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 957939712. Throughput: 0: 41290.6. Samples: 840182980. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:32:13,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 18:32:17,607][00497] Updated weights for policy 0, policy_version 58478 (0.0021) [2024-03-29 18:32:17,723][00476] Signal inference workers to stop experience collection... (29850 times) [2024-03-29 18:32:17,766][00497] InferenceWorker_p0-w0: stopping experience collection (29850 times) [2024-03-29 18:32:17,889][00476] Signal inference workers to resume experience collection... (29850 times) [2024-03-29 18:32:17,890][00497] InferenceWorker_p0-w0: resuming experience collection (29850 times) [2024-03-29 18:32:18,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 958169088. Throughput: 0: 41570.1. Samples: 840313660. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:32:18,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:32:21,400][00497] Updated weights for policy 0, policy_version 58488 (0.0021) [2024-03-29 18:32:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40686.9, 300 sec: 41543.2). Total num frames: 958349312. Throughput: 0: 40979.2. Samples: 840549120. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:32:23,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 18:32:25,522][00497] Updated weights for policy 0, policy_version 58498 (0.0023) [2024-03-29 18:32:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 958578688. Throughput: 0: 41129.0. Samples: 840801060. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 18:32:28,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:32:29,524][00497] Updated weights for policy 0, policy_version 58508 (0.0024) [2024-03-29 18:32:33,270][00497] Updated weights for policy 0, policy_version 58518 (0.0019) [2024-03-29 18:32:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 958775296. Throughput: 0: 41512.0. Samples: 840936620. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:32:33,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 18:32:36,983][00497] Updated weights for policy 0, policy_version 58528 (0.0025) [2024-03-29 18:32:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 958988288. Throughput: 0: 41483.2. Samples: 841179420. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:32:38,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 18:32:41,122][00497] Updated weights for policy 0, policy_version 58538 (0.0025) [2024-03-29 18:32:43,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 959217664. Throughput: 0: 41223.1. Samples: 841424140. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:32:43,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 18:32:45,130][00497] Updated weights for policy 0, policy_version 58548 (0.0036) [2024-03-29 18:32:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 959397888. Throughput: 0: 41911.1. Samples: 841569700. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:32:48,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:32:48,939][00497] Updated weights for policy 0, policy_version 58558 (0.0037) [2024-03-29 18:32:51,537][00476] Signal inference workers to stop experience collection... (29900 times) [2024-03-29 18:32:51,617][00497] InferenceWorker_p0-w0: stopping experience collection (29900 times) [2024-03-29 18:32:51,617][00476] Signal inference workers to resume experience collection... (29900 times) [2024-03-29 18:32:51,642][00497] InferenceWorker_p0-w0: resuming experience collection (29900 times) [2024-03-29 18:32:52,591][00497] Updated weights for policy 0, policy_version 58568 (0.0028) [2024-03-29 18:32:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 959627264. Throughput: 0: 41622.5. Samples: 841800220. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:32:53,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 18:32:56,801][00497] Updated weights for policy 0, policy_version 58578 (0.0023) [2024-03-29 18:32:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 959840256. Throughput: 0: 41555.1. Samples: 842052960. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:32:58,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:33:00,826][00497] Updated weights for policy 0, policy_version 58588 (0.0023) [2024-03-29 18:33:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 960036864. Throughput: 0: 41861.8. Samples: 842197440. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 18:33:03,841][00126] Avg episode reward: [(0, '0.660')] [2024-03-29 18:33:04,037][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000058597_960053248.pth... [2024-03-29 18:33:04,357][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000057988_950075392.pth [2024-03-29 18:33:04,737][00497] Updated weights for policy 0, policy_version 58598 (0.0021) [2024-03-29 18:33:08,187][00497] Updated weights for policy 0, policy_version 58608 (0.0019) [2024-03-29 18:33:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 41543.2). Total num frames: 960266240. Throughput: 0: 41712.1. Samples: 842426160. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:08,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:33:12,552][00497] Updated weights for policy 0, policy_version 58618 (0.0029) [2024-03-29 18:33:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 960462848. Throughput: 0: 41682.7. Samples: 842676780. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:13,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 18:33:16,648][00497] Updated weights for policy 0, policy_version 58628 (0.0020) [2024-03-29 18:33:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.3, 300 sec: 41487.7). Total num frames: 960659456. Throughput: 0: 41534.8. Samples: 842805680. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:18,840][00126] Avg episode reward: [(0, '0.519')] [2024-03-29 18:33:20,325][00497] Updated weights for policy 0, policy_version 58638 (0.0025) [2024-03-29 18:33:23,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 960872448. Throughput: 0: 41627.1. Samples: 843052640. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:23,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:33:23,864][00497] Updated weights for policy 0, policy_version 58648 (0.0029) [2024-03-29 18:33:27,538][00476] Signal inference workers to stop experience collection... (29950 times) [2024-03-29 18:33:27,617][00476] Signal inference workers to resume experience collection... (29950 times) [2024-03-29 18:33:27,619][00497] InferenceWorker_p0-w0: stopping experience collection (29950 times) [2024-03-29 18:33:27,643][00497] InferenceWorker_p0-w0: resuming experience collection (29950 times) [2024-03-29 18:33:28,223][00497] Updated weights for policy 0, policy_version 58658 (0.0023) [2024-03-29 18:33:28,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 961085440. Throughput: 0: 42027.5. Samples: 843315380. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:28,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:33:32,299][00497] Updated weights for policy 0, policy_version 58668 (0.0021) [2024-03-29 18:33:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 961282048. Throughput: 0: 41476.8. Samples: 843436160. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:33,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:33:35,772][00497] Updated weights for policy 0, policy_version 58678 (0.0029) [2024-03-29 18:33:38,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 961511424. Throughput: 0: 41956.9. Samples: 843688280. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 18:33:38,841][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 18:33:39,238][00497] Updated weights for policy 0, policy_version 58688 (0.0020) [2024-03-29 18:33:43,751][00497] Updated weights for policy 0, policy_version 58698 (0.0025) [2024-03-29 18:33:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 961708032. Throughput: 0: 42173.7. Samples: 843950780. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:33:43,840][00126] Avg episode reward: [(0, '0.636')] [2024-03-29 18:33:47,546][00497] Updated weights for policy 0, policy_version 58708 (0.0022) [2024-03-29 18:33:48,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 961904640. Throughput: 0: 41569.4. Samples: 844068060. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:33:48,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:33:51,688][00497] Updated weights for policy 0, policy_version 58718 (0.0029) [2024-03-29 18:33:53,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 962134016. Throughput: 0: 42148.4. Samples: 844322840. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:33:53,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 18:33:54,906][00497] Updated weights for policy 0, policy_version 58728 (0.0027) [2024-03-29 18:33:57,438][00476] Signal inference workers to stop experience collection... (30000 times) [2024-03-29 18:33:57,438][00476] Signal inference workers to resume experience collection... (30000 times) [2024-03-29 18:33:57,463][00497] InferenceWorker_p0-w0: stopping experience collection (30000 times) [2024-03-29 18:33:57,463][00497] InferenceWorker_p0-w0: resuming experience collection (30000 times) [2024-03-29 18:33:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 962330624. Throughput: 0: 42181.3. Samples: 844574940. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:33:58,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 18:33:59,263][00497] Updated weights for policy 0, policy_version 58738 (0.0027) [2024-03-29 18:34:03,150][00497] Updated weights for policy 0, policy_version 58748 (0.0019) [2024-03-29 18:34:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 41432.1). Total num frames: 962527232. Throughput: 0: 41891.5. Samples: 844690800. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:34:03,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 18:34:07,279][00497] Updated weights for policy 0, policy_version 58758 (0.0021) [2024-03-29 18:34:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 962772992. Throughput: 0: 42288.5. Samples: 844955620. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:34:08,842][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 18:34:10,555][00497] Updated weights for policy 0, policy_version 58768 (0.0022) [2024-03-29 18:34:13,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 962969600. Throughput: 0: 42059.9. Samples: 845208080. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:34:13,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 18:34:14,922][00497] Updated weights for policy 0, policy_version 58778 (0.0025) [2024-03-29 18:34:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 963166208. Throughput: 0: 41728.5. Samples: 845313940. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 18:34:18,840][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 18:34:18,893][00497] Updated weights for policy 0, policy_version 58788 (0.0019) [2024-03-29 18:34:22,958][00497] Updated weights for policy 0, policy_version 58798 (0.0019) [2024-03-29 18:34:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 963395584. Throughput: 0: 41979.5. Samples: 845577360. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:23,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 18:34:26,092][00497] Updated weights for policy 0, policy_version 58808 (0.0031) [2024-03-29 18:34:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 963592192. Throughput: 0: 41797.4. Samples: 845831660. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:28,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 18:34:29,999][00476] Signal inference workers to stop experience collection... (30050 times) [2024-03-29 18:34:30,033][00497] InferenceWorker_p0-w0: stopping experience collection (30050 times) [2024-03-29 18:34:30,213][00476] Signal inference workers to resume experience collection... (30050 times) [2024-03-29 18:34:30,213][00497] InferenceWorker_p0-w0: resuming experience collection (30050 times) [2024-03-29 18:34:30,493][00497] Updated weights for policy 0, policy_version 58818 (0.0026) [2024-03-29 18:34:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 963821568. Throughput: 0: 41931.0. Samples: 845954960. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:33,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:34:34,383][00497] Updated weights for policy 0, policy_version 58828 (0.0028) [2024-03-29 18:34:38,642][00497] Updated weights for policy 0, policy_version 58838 (0.0022) [2024-03-29 18:34:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 964001792. Throughput: 0: 41852.8. Samples: 846206220. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:38,840][00126] Avg episode reward: [(0, '0.647')] [2024-03-29 18:34:41,892][00497] Updated weights for policy 0, policy_version 58848 (0.0019) [2024-03-29 18:34:43,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 964214784. Throughput: 0: 41762.7. Samples: 846454260. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:43,840][00126] Avg episode reward: [(0, '0.522')] [2024-03-29 18:34:46,138][00497] Updated weights for policy 0, policy_version 58858 (0.0022) [2024-03-29 18:34:48,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 41765.3). Total num frames: 964460544. Throughput: 0: 42074.6. Samples: 846584160. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:48,840][00126] Avg episode reward: [(0, '0.515')] [2024-03-29 18:34:50,049][00497] Updated weights for policy 0, policy_version 58868 (0.0026) [2024-03-29 18:34:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 964640768. Throughput: 0: 42075.1. Samples: 846849000. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 18:34:53,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:34:54,314][00497] Updated weights for policy 0, policy_version 58878 (0.0028) [2024-03-29 18:34:57,375][00497] Updated weights for policy 0, policy_version 58888 (0.0027) [2024-03-29 18:34:58,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 964853760. Throughput: 0: 41570.4. Samples: 847078740. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:34:58,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 18:35:01,952][00497] Updated weights for policy 0, policy_version 58898 (0.0027) [2024-03-29 18:35:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 965066752. Throughput: 0: 42143.1. Samples: 847210380. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:35:03,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:35:04,174][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000058905_965099520.pth... [2024-03-29 18:35:04,501][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000058292_955056128.pth [2024-03-29 18:35:04,562][00476] Signal inference workers to stop experience collection... (30100 times) [2024-03-29 18:35:04,598][00497] InferenceWorker_p0-w0: stopping experience collection (30100 times) [2024-03-29 18:35:04,774][00476] Signal inference workers to resume experience collection... (30100 times) [2024-03-29 18:35:04,774][00497] InferenceWorker_p0-w0: resuming experience collection (30100 times) [2024-03-29 18:35:05,935][00497] Updated weights for policy 0, policy_version 58908 (0.0022) [2024-03-29 18:35:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 965263360. Throughput: 0: 42196.5. Samples: 847476200. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:35:08,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:35:10,206][00497] Updated weights for policy 0, policy_version 58918 (0.0027) [2024-03-29 18:35:13,305][00497] Updated weights for policy 0, policy_version 58928 (0.0031) [2024-03-29 18:35:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 965476352. Throughput: 0: 41219.9. Samples: 847686560. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:35:13,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:35:17,735][00497] Updated weights for policy 0, policy_version 58938 (0.0022) [2024-03-29 18:35:18,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 965689344. Throughput: 0: 41681.9. Samples: 847830640. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:35:18,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:35:21,565][00497] Updated weights for policy 0, policy_version 58948 (0.0025) [2024-03-29 18:35:23,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 965869568. Throughput: 0: 41870.7. Samples: 848090400. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:35:23,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 18:35:26,058][00497] Updated weights for policy 0, policy_version 58958 (0.0023) [2024-03-29 18:35:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 966098944. Throughput: 0: 41178.6. Samples: 848307300. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 18:35:28,840][00126] Avg episode reward: [(0, '0.625')] [2024-03-29 18:35:29,199][00497] Updated weights for policy 0, policy_version 58968 (0.0026) [2024-03-29 18:35:33,674][00497] Updated weights for policy 0, policy_version 58978 (0.0018) [2024-03-29 18:35:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 966295552. Throughput: 0: 41249.5. Samples: 848440380. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:35:33,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:35:37,686][00497] Updated weights for policy 0, policy_version 58988 (0.0024) [2024-03-29 18:35:38,736][00476] Signal inference workers to stop experience collection... (30150 times) [2024-03-29 18:35:38,765][00497] InferenceWorker_p0-w0: stopping experience collection (30150 times) [2024-03-29 18:35:38,839][00126] Fps is (10 sec: 36045.2, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 966459392. Throughput: 0: 40784.5. Samples: 848684300. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:35:38,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:35:38,947][00476] Signal inference workers to resume experience collection... (30150 times) [2024-03-29 18:35:38,948][00497] InferenceWorker_p0-w0: resuming experience collection (30150 times) [2024-03-29 18:35:42,261][00497] Updated weights for policy 0, policy_version 58998 (0.0025) [2024-03-29 18:35:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 966705152. Throughput: 0: 41040.9. Samples: 848925580. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:35:43,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:35:45,536][00497] Updated weights for policy 0, policy_version 59008 (0.0023) [2024-03-29 18:35:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 40413.9, 300 sec: 41543.2). Total num frames: 966885376. Throughput: 0: 40836.9. Samples: 849048040. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:35:48,841][00126] Avg episode reward: [(0, '0.640')] [2024-03-29 18:35:49,941][00497] Updated weights for policy 0, policy_version 59018 (0.0032) [2024-03-29 18:35:53,839][00126] Fps is (10 sec: 39321.0, 60 sec: 40959.9, 300 sec: 41543.1). Total num frames: 967098368. Throughput: 0: 40289.3. Samples: 849289220. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:35:53,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 18:35:54,010][00497] Updated weights for policy 0, policy_version 59028 (0.0018) [2024-03-29 18:35:58,103][00497] Updated weights for policy 0, policy_version 59038 (0.0023) [2024-03-29 18:35:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40959.9, 300 sec: 41543.2). Total num frames: 967311360. Throughput: 0: 41300.9. Samples: 849545100. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:35:58,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 18:36:01,485][00497] Updated weights for policy 0, policy_version 59048 (0.0019) [2024-03-29 18:36:03,839][00126] Fps is (10 sec: 40960.7, 60 sec: 40687.0, 300 sec: 41487.6). Total num frames: 967507968. Throughput: 0: 40624.4. Samples: 849658740. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:36:03,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 18:36:05,906][00497] Updated weights for policy 0, policy_version 59058 (0.0025) [2024-03-29 18:36:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 967737344. Throughput: 0: 40594.2. Samples: 849917140. Policy #0 lag: (min: 1.0, avg: 19.7, max: 40.0) [2024-03-29 18:36:08,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:36:09,827][00497] Updated weights for policy 0, policy_version 59068 (0.0026) [2024-03-29 18:36:10,577][00476] Signal inference workers to stop experience collection... (30200 times) [2024-03-29 18:36:10,612][00497] InferenceWorker_p0-w0: stopping experience collection (30200 times) [2024-03-29 18:36:10,800][00476] Signal inference workers to resume experience collection... (30200 times) [2024-03-29 18:36:10,800][00497] InferenceWorker_p0-w0: resuming experience collection (30200 times) [2024-03-29 18:36:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40687.0, 300 sec: 41487.6). Total num frames: 967917568. Throughput: 0: 41190.3. Samples: 850160860. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:13,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 18:36:13,964][00497] Updated weights for policy 0, policy_version 59078 (0.0023) [2024-03-29 18:36:17,202][00497] Updated weights for policy 0, policy_version 59088 (0.0024) [2024-03-29 18:36:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 968146944. Throughput: 0: 40883.9. Samples: 850280160. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:18,840][00126] Avg episode reward: [(0, '0.650')] [2024-03-29 18:36:21,551][00497] Updated weights for policy 0, policy_version 59098 (0.0020) [2024-03-29 18:36:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41506.0, 300 sec: 41598.7). Total num frames: 968359936. Throughput: 0: 41284.3. Samples: 850542100. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:23,841][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 18:36:25,493][00497] Updated weights for policy 0, policy_version 59108 (0.0017) [2024-03-29 18:36:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 968572928. Throughput: 0: 41407.5. Samples: 850788920. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:28,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 18:36:29,752][00497] Updated weights for policy 0, policy_version 59118 (0.0034) [2024-03-29 18:36:33,298][00497] Updated weights for policy 0, policy_version 59128 (0.0024) [2024-03-29 18:36:33,839][00126] Fps is (10 sec: 39322.4, 60 sec: 40960.0, 300 sec: 41487.7). Total num frames: 968753152. Throughput: 0: 41242.8. Samples: 850903960. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:33,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 18:36:37,367][00497] Updated weights for policy 0, policy_version 59138 (0.0021) [2024-03-29 18:36:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 968982528. Throughput: 0: 41747.7. Samples: 851167860. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:38,840][00126] Avg episode reward: [(0, '0.627')] [2024-03-29 18:36:41,058][00497] Updated weights for policy 0, policy_version 59148 (0.0021) [2024-03-29 18:36:42,757][00476] Signal inference workers to stop experience collection... (30250 times) [2024-03-29 18:36:42,836][00497] InferenceWorker_p0-w0: stopping experience collection (30250 times) [2024-03-29 18:36:42,924][00476] Signal inference workers to resume experience collection... (30250 times) [2024-03-29 18:36:42,925][00497] InferenceWorker_p0-w0: resuming experience collection (30250 times) [2024-03-29 18:36:43,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41233.0, 300 sec: 41543.2). Total num frames: 969179136. Throughput: 0: 41626.6. Samples: 851418300. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 18:36:43,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 18:36:45,363][00497] Updated weights for policy 0, policy_version 59158 (0.0021) [2024-03-29 18:36:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 969392128. Throughput: 0: 41831.5. Samples: 851541160. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:36:48,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 18:36:49,055][00497] Updated weights for policy 0, policy_version 59168 (0.0025) [2024-03-29 18:36:53,051][00497] Updated weights for policy 0, policy_version 59178 (0.0017) [2024-03-29 18:36:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 969605120. Throughput: 0: 41678.6. Samples: 851792680. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:36:53,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 18:36:56,797][00497] Updated weights for policy 0, policy_version 59188 (0.0019) [2024-03-29 18:36:58,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 969801728. Throughput: 0: 42131.0. Samples: 852056760. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:36:58,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 18:37:01,086][00497] Updated weights for policy 0, policy_version 59198 (0.0038) [2024-03-29 18:37:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 970031104. Throughput: 0: 42126.6. Samples: 852175860. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:37:03,840][00126] Avg episode reward: [(0, '0.555')] [2024-03-29 18:37:04,134][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000059207_970047488.pth... [2024-03-29 18:37:04,470][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000058597_960053248.pth [2024-03-29 18:37:04,816][00497] Updated weights for policy 0, policy_version 59208 (0.0022) [2024-03-29 18:37:08,829][00497] Updated weights for policy 0, policy_version 59218 (0.0023) [2024-03-29 18:37:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 970227712. Throughput: 0: 41505.9. Samples: 852409860. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:37:08,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:37:12,806][00497] Updated weights for policy 0, policy_version 59228 (0.0023) [2024-03-29 18:37:13,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 970407936. Throughput: 0: 41806.7. Samples: 852670220. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:37:13,840][00126] Avg episode reward: [(0, '0.655')] [2024-03-29 18:37:16,898][00497] Updated weights for policy 0, policy_version 59238 (0.0025) [2024-03-29 18:37:18,814][00476] Signal inference workers to stop experience collection... (30300 times) [2024-03-29 18:37:18,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 970653696. Throughput: 0: 41982.0. Samples: 852793160. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:37:18,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:37:18,849][00497] InferenceWorker_p0-w0: stopping experience collection (30300 times) [2024-03-29 18:37:19,044][00476] Signal inference workers to resume experience collection... (30300 times) [2024-03-29 18:37:19,045][00497] InferenceWorker_p0-w0: resuming experience collection (30300 times) [2024-03-29 18:37:20,475][00497] Updated weights for policy 0, policy_version 59248 (0.0022) [2024-03-29 18:37:23,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 970850304. Throughput: 0: 41476.8. Samples: 853034320. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 18:37:23,840][00126] Avg episode reward: [(0, '0.642')] [2024-03-29 18:37:24,652][00497] Updated weights for policy 0, policy_version 59258 (0.0026) [2024-03-29 18:37:28,511][00497] Updated weights for policy 0, policy_version 59268 (0.0025) [2024-03-29 18:37:28,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 971046912. Throughput: 0: 41625.8. Samples: 853291460. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:28,841][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:37:32,609][00497] Updated weights for policy 0, policy_version 59278 (0.0022) [2024-03-29 18:37:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 971276288. Throughput: 0: 41671.1. Samples: 853416360. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:33,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 18:37:36,277][00497] Updated weights for policy 0, policy_version 59288 (0.0022) [2024-03-29 18:37:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.0, 300 sec: 41543.1). Total num frames: 971472896. Throughput: 0: 41353.8. Samples: 853653600. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:38,840][00126] Avg episode reward: [(0, '0.673')] [2024-03-29 18:37:40,236][00497] Updated weights for policy 0, policy_version 59298 (0.0017) [2024-03-29 18:37:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 971685888. Throughput: 0: 41511.2. Samples: 853924760. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:43,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 18:37:43,925][00497] Updated weights for policy 0, policy_version 59308 (0.0018) [2024-03-29 18:37:48,020][00497] Updated weights for policy 0, policy_version 59318 (0.0029) [2024-03-29 18:37:48,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 971898880. Throughput: 0: 41552.1. Samples: 854045700. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:48,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:37:51,405][00497] Updated weights for policy 0, policy_version 59328 (0.0030) [2024-03-29 18:37:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 972111872. Throughput: 0: 42065.7. Samples: 854302820. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:53,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 18:37:55,091][00476] Signal inference workers to stop experience collection... (30350 times) [2024-03-29 18:37:55,166][00497] InferenceWorker_p0-w0: stopping experience collection (30350 times) [2024-03-29 18:37:55,171][00476] Signal inference workers to resume experience collection... (30350 times) [2024-03-29 18:37:55,192][00497] InferenceWorker_p0-w0: resuming experience collection (30350 times) [2024-03-29 18:37:55,474][00497] Updated weights for policy 0, policy_version 59338 (0.0026) [2024-03-29 18:37:58,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 972341248. Throughput: 0: 42096.0. Samples: 854564540. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:37:58,840][00126] Avg episode reward: [(0, '0.643')] [2024-03-29 18:37:59,372][00497] Updated weights for policy 0, policy_version 59348 (0.0023) [2024-03-29 18:38:03,363][00497] Updated weights for policy 0, policy_version 59358 (0.0024) [2024-03-29 18:38:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 972537856. Throughput: 0: 42164.6. Samples: 854690560. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:03,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 18:38:06,690][00497] Updated weights for policy 0, policy_version 59368 (0.0022) [2024-03-29 18:38:08,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 972750848. Throughput: 0: 42404.4. Samples: 854942520. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:08,840][00126] Avg episode reward: [(0, '0.663')] [2024-03-29 18:38:10,968][00497] Updated weights for policy 0, policy_version 59378 (0.0022) [2024-03-29 18:38:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 41709.8). Total num frames: 972963840. Throughput: 0: 42471.2. Samples: 855202660. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:13,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 18:38:14,868][00497] Updated weights for policy 0, policy_version 59388 (0.0023) [2024-03-29 18:38:18,838][00497] Updated weights for policy 0, policy_version 59398 (0.0025) [2024-03-29 18:38:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 973176832. Throughput: 0: 42719.5. Samples: 855338740. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:18,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 18:38:22,458][00497] Updated weights for policy 0, policy_version 59408 (0.0037) [2024-03-29 18:38:23,840][00126] Fps is (10 sec: 40958.2, 60 sec: 42052.0, 300 sec: 41654.2). Total num frames: 973373440. Throughput: 0: 42547.3. Samples: 855568240. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:23,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 18:38:26,506][00497] Updated weights for policy 0, policy_version 59418 (0.0022) [2024-03-29 18:38:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 973586432. Throughput: 0: 42389.8. Samples: 855832300. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:28,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:38:30,460][00497] Updated weights for policy 0, policy_version 59428 (0.0024) [2024-03-29 18:38:31,510][00476] Signal inference workers to stop experience collection... (30400 times) [2024-03-29 18:38:31,578][00497] InferenceWorker_p0-w0: stopping experience collection (30400 times) [2024-03-29 18:38:31,673][00476] Signal inference workers to resume experience collection... (30400 times) [2024-03-29 18:38:31,674][00497] InferenceWorker_p0-w0: resuming experience collection (30400 times) [2024-03-29 18:38:33,839][00126] Fps is (10 sec: 42600.2, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 973799424. Throughput: 0: 42676.0. Samples: 855966120. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 18:38:33,840][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 18:38:34,456][00497] Updated weights for policy 0, policy_version 59438 (0.0028) [2024-03-29 18:38:38,156][00497] Updated weights for policy 0, policy_version 59448 (0.0026) [2024-03-29 18:38:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 974012416. Throughput: 0: 42054.2. Samples: 856195260. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:38:38,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 18:38:42,117][00497] Updated weights for policy 0, policy_version 59458 (0.0026) [2024-03-29 18:38:43,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 974225408. Throughput: 0: 42322.1. Samples: 856469040. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:38:43,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 18:38:46,147][00497] Updated weights for policy 0, policy_version 59468 (0.0026) [2024-03-29 18:38:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 974422016. Throughput: 0: 42060.0. Samples: 856583260. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:38:48,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:38:50,100][00497] Updated weights for policy 0, policy_version 59478 (0.0021) [2024-03-29 18:38:53,592][00497] Updated weights for policy 0, policy_version 59488 (0.0021) [2024-03-29 18:38:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 974651392. Throughput: 0: 41769.7. Samples: 856822160. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:38:53,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 18:38:57,739][00497] Updated weights for policy 0, policy_version 59498 (0.0024) [2024-03-29 18:38:58,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 974848000. Throughput: 0: 42098.6. Samples: 857097100. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:38:58,840][00126] Avg episode reward: [(0, '0.658')] [2024-03-29 18:39:01,999][00497] Updated weights for policy 0, policy_version 59508 (0.0022) [2024-03-29 18:39:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 975060992. Throughput: 0: 41590.2. Samples: 857210300. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:39:03,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 18:39:04,126][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000059514_975077376.pth... [2024-03-29 18:39:04,454][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000058905_965099520.pth [2024-03-29 18:39:04,508][00476] Signal inference workers to stop experience collection... (30450 times) [2024-03-29 18:39:04,531][00497] InferenceWorker_p0-w0: stopping experience collection (30450 times) [2024-03-29 18:39:04,721][00476] Signal inference workers to resume experience collection... (30450 times) [2024-03-29 18:39:04,721][00497] InferenceWorker_p0-w0: resuming experience collection (30450 times) [2024-03-29 18:39:05,681][00497] Updated weights for policy 0, policy_version 59518 (0.0019) [2024-03-29 18:39:08,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 975290368. Throughput: 0: 41890.1. Samples: 857453280. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:39:08,841][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 18:39:08,971][00497] Updated weights for policy 0, policy_version 59528 (0.0031) [2024-03-29 18:39:13,490][00497] Updated weights for policy 0, policy_version 59538 (0.0029) [2024-03-29 18:39:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 975470592. Throughput: 0: 41899.8. Samples: 857717800. Policy #0 lag: (min: 0.0, avg: 21.3, max: 40.0) [2024-03-29 18:39:13,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 18:39:17,843][00497] Updated weights for policy 0, policy_version 59548 (0.0026) [2024-03-29 18:39:18,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 975650816. Throughput: 0: 41594.2. Samples: 857837860. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:18,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:39:21,477][00497] Updated weights for policy 0, policy_version 59558 (0.0023) [2024-03-29 18:39:23,839][00126] Fps is (10 sec: 44237.8, 60 sec: 42325.6, 300 sec: 41765.3). Total num frames: 975912960. Throughput: 0: 41826.8. Samples: 858077460. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:23,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 18:39:24,935][00497] Updated weights for policy 0, policy_version 59568 (0.0019) [2024-03-29 18:39:28,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 976109568. Throughput: 0: 41600.5. Samples: 858341060. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:28,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:39:29,248][00497] Updated weights for policy 0, policy_version 59578 (0.0018) [2024-03-29 18:39:33,400][00497] Updated weights for policy 0, policy_version 59588 (0.0024) [2024-03-29 18:39:33,839][00126] Fps is (10 sec: 37682.5, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 976289792. Throughput: 0: 41696.2. Samples: 858459600. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:33,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:39:36,504][00476] Signal inference workers to stop experience collection... (30500 times) [2024-03-29 18:39:36,574][00497] InferenceWorker_p0-w0: stopping experience collection (30500 times) [2024-03-29 18:39:36,606][00476] Signal inference workers to resume experience collection... (30500 times) [2024-03-29 18:39:36,608][00497] InferenceWorker_p0-w0: resuming experience collection (30500 times) [2024-03-29 18:39:37,247][00497] Updated weights for policy 0, policy_version 59598 (0.0030) [2024-03-29 18:39:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 976535552. Throughput: 0: 41843.7. Samples: 858705120. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:38,841][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 18:39:40,712][00497] Updated weights for policy 0, policy_version 59608 (0.0022) [2024-03-29 18:39:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 976715776. Throughput: 0: 41371.5. Samples: 858958820. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:43,840][00126] Avg episode reward: [(0, '0.638')] [2024-03-29 18:39:45,040][00497] Updated weights for policy 0, policy_version 59618 (0.0018) [2024-03-29 18:39:48,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 976928768. Throughput: 0: 41616.0. Samples: 859083020. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:48,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 18:39:49,044][00497] Updated weights for policy 0, policy_version 59628 (0.0035) [2024-03-29 18:39:52,879][00497] Updated weights for policy 0, policy_version 59638 (0.0024) [2024-03-29 18:39:53,839][00126] Fps is (10 sec: 44237.5, 60 sec: 41779.4, 300 sec: 41709.8). Total num frames: 977158144. Throughput: 0: 41790.3. Samples: 859333840. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:53,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 18:39:56,291][00497] Updated weights for policy 0, policy_version 59648 (0.0022) [2024-03-29 18:39:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 977354752. Throughput: 0: 41429.0. Samples: 859582100. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:39:58,841][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 18:40:00,644][00497] Updated weights for policy 0, policy_version 59658 (0.0020) [2024-03-29 18:40:03,839][00126] Fps is (10 sec: 40959.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 977567744. Throughput: 0: 41685.2. Samples: 859713700. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:40:03,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 18:40:04,745][00497] Updated weights for policy 0, policy_version 59668 (0.0021) [2024-03-29 18:40:07,500][00476] Signal inference workers to stop experience collection... (30550 times) [2024-03-29 18:40:07,500][00476] Signal inference workers to resume experience collection... (30550 times) [2024-03-29 18:40:07,547][00497] InferenceWorker_p0-w0: stopping experience collection (30550 times) [2024-03-29 18:40:07,548][00497] InferenceWorker_p0-w0: resuming experience collection (30550 times) [2024-03-29 18:40:08,494][00497] Updated weights for policy 0, policy_version 59678 (0.0028) [2024-03-29 18:40:08,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 977780736. Throughput: 0: 41914.7. Samples: 859963620. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:40:08,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 18:40:12,001][00497] Updated weights for policy 0, policy_version 59688 (0.0023) [2024-03-29 18:40:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 977977344. Throughput: 0: 41628.4. Samples: 860214340. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:40:13,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:40:16,451][00497] Updated weights for policy 0, policy_version 59698 (0.0020) [2024-03-29 18:40:18,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 978190336. Throughput: 0: 41541.0. Samples: 860328940. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:40:18,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 18:40:20,584][00497] Updated weights for policy 0, policy_version 59708 (0.0024) [2024-03-29 18:40:23,839][00126] Fps is (10 sec: 39322.2, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 978370560. Throughput: 0: 42105.3. Samples: 860599860. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:40:23,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:40:24,475][00497] Updated weights for policy 0, policy_version 59718 (0.0033) [2024-03-29 18:40:27,934][00497] Updated weights for policy 0, policy_version 59728 (0.0021) [2024-03-29 18:40:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 978583552. Throughput: 0: 41365.8. Samples: 860820280. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 18:40:28,840][00126] Avg episode reward: [(0, '0.521')] [2024-03-29 18:40:32,334][00497] Updated weights for policy 0, policy_version 59738 (0.0031) [2024-03-29 18:40:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 978796544. Throughput: 0: 41602.6. Samples: 860955140. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:40:33,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 18:40:36,216][00497] Updated weights for policy 0, policy_version 59748 (0.0019) [2024-03-29 18:40:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40959.9, 300 sec: 41654.2). Total num frames: 978993152. Throughput: 0: 41890.1. Samples: 861218900. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:40:38,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 18:40:40,237][00497] Updated weights for policy 0, policy_version 59758 (0.0024) [2024-03-29 18:40:40,459][00476] Signal inference workers to stop experience collection... (30600 times) [2024-03-29 18:40:40,533][00476] Signal inference workers to resume experience collection... (30600 times) [2024-03-29 18:40:40,535][00497] InferenceWorker_p0-w0: stopping experience collection (30600 times) [2024-03-29 18:40:40,573][00497] InferenceWorker_p0-w0: resuming experience collection (30600 times) [2024-03-29 18:40:43,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 979222528. Throughput: 0: 41153.0. Samples: 861433980. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:40:43,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 18:40:43,889][00497] Updated weights for policy 0, policy_version 59768 (0.0026) [2024-03-29 18:40:48,255][00497] Updated weights for policy 0, policy_version 59778 (0.0027) [2024-03-29 18:40:48,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 979419136. Throughput: 0: 41153.3. Samples: 861565600. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:40:48,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:40:52,141][00497] Updated weights for policy 0, policy_version 59788 (0.0031) [2024-03-29 18:40:53,839][00126] Fps is (10 sec: 37682.8, 60 sec: 40686.8, 300 sec: 41654.2). Total num frames: 979599360. Throughput: 0: 41442.0. Samples: 861828520. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:40:53,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:40:56,057][00497] Updated weights for policy 0, policy_version 59798 (0.0025) [2024-03-29 18:40:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 979845120. Throughput: 0: 40840.5. Samples: 862052160. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:40:58,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 18:40:59,543][00497] Updated weights for policy 0, policy_version 59808 (0.0032) [2024-03-29 18:41:03,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 980041728. Throughput: 0: 41295.1. Samples: 862187220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:41:03,840][00126] Avg episode reward: [(0, '0.644')] [2024-03-29 18:41:04,082][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000059818_980058112.pth... [2024-03-29 18:41:04,084][00497] Updated weights for policy 0, policy_version 59818 (0.0022) [2024-03-29 18:41:04,397][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000059207_970047488.pth [2024-03-29 18:41:08,116][00497] Updated weights for policy 0, policy_version 59828 (0.0019) [2024-03-29 18:41:08,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40686.9, 300 sec: 41709.8). Total num frames: 980221952. Throughput: 0: 40992.0. Samples: 862444500. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:08,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:41:12,095][00497] Updated weights for policy 0, policy_version 59838 (0.0021) [2024-03-29 18:41:12,735][00476] Signal inference workers to stop experience collection... (30650 times) [2024-03-29 18:41:12,774][00497] InferenceWorker_p0-w0: stopping experience collection (30650 times) [2024-03-29 18:41:12,967][00476] Signal inference workers to resume experience collection... (30650 times) [2024-03-29 18:41:12,968][00497] InferenceWorker_p0-w0: resuming experience collection (30650 times) [2024-03-29 18:41:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 980484096. Throughput: 0: 41300.1. Samples: 862678780. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:13,840][00126] Avg episode reward: [(0, '0.676')] [2024-03-29 18:41:15,676][00497] Updated weights for policy 0, policy_version 59848 (0.0022) [2024-03-29 18:41:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 41654.3). Total num frames: 980647936. Throughput: 0: 41083.2. Samples: 862803880. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:18,840][00126] Avg episode reward: [(0, '0.644')] [2024-03-29 18:41:20,088][00497] Updated weights for policy 0, policy_version 59858 (0.0026) [2024-03-29 18:41:23,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 980860928. Throughput: 0: 40720.6. Samples: 863051320. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:23,841][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 18:41:24,003][00497] Updated weights for policy 0, policy_version 59868 (0.0022) [2024-03-29 18:41:28,044][00497] Updated weights for policy 0, policy_version 59878 (0.0025) [2024-03-29 18:41:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 981073920. Throughput: 0: 41426.7. Samples: 863298180. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:28,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 18:41:31,783][00497] Updated weights for policy 0, policy_version 59888 (0.0025) [2024-03-29 18:41:33,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41506.1, 300 sec: 41709.7). Total num frames: 981286912. Throughput: 0: 41044.0. Samples: 863412580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:33,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 18:41:36,072][00497] Updated weights for policy 0, policy_version 59898 (0.0021) [2024-03-29 18:41:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 981467136. Throughput: 0: 40883.6. Samples: 863668280. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 18:41:38,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 18:41:39,932][00497] Updated weights for policy 0, policy_version 59908 (0.0020) [2024-03-29 18:41:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40959.9, 300 sec: 41654.2). Total num frames: 981680128. Throughput: 0: 41315.9. Samples: 863911380. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:41:43,841][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 18:41:44,097][00497] Updated weights for policy 0, policy_version 59918 (0.0031) [2024-03-29 18:41:45,883][00476] Signal inference workers to stop experience collection... (30700 times) [2024-03-29 18:41:45,922][00497] InferenceWorker_p0-w0: stopping experience collection (30700 times) [2024-03-29 18:41:46,111][00476] Signal inference workers to resume experience collection... (30700 times) [2024-03-29 18:41:46,111][00497] InferenceWorker_p0-w0: resuming experience collection (30700 times) [2024-03-29 18:41:48,330][00497] Updated weights for policy 0, policy_version 59928 (0.0027) [2024-03-29 18:41:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 981876736. Throughput: 0: 40659.6. Samples: 864016900. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:41:48,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 18:41:52,143][00497] Updated weights for policy 0, policy_version 59938 (0.0028) [2024-03-29 18:41:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 982089728. Throughput: 0: 40840.9. Samples: 864282340. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:41:53,840][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 18:41:55,959][00497] Updated weights for policy 0, policy_version 59948 (0.0025) [2024-03-29 18:41:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40686.9, 300 sec: 41543.2). Total num frames: 982286336. Throughput: 0: 41471.9. Samples: 864545020. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:41:58,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 18:41:59,872][00497] Updated weights for policy 0, policy_version 59958 (0.0029) [2024-03-29 18:42:03,801][00497] Updated weights for policy 0, policy_version 59968 (0.0032) [2024-03-29 18:42:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 982515712. Throughput: 0: 40980.9. Samples: 864648020. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:42:03,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 18:42:07,820][00497] Updated weights for policy 0, policy_version 59978 (0.0030) [2024-03-29 18:42:08,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 982728704. Throughput: 0: 41394.1. Samples: 864914060. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:42:08,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 18:42:11,904][00497] Updated weights for policy 0, policy_version 59988 (0.0023) [2024-03-29 18:42:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40140.8, 300 sec: 41487.6). Total num frames: 982892544. Throughput: 0: 41815.5. Samples: 865179880. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:42:13,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:42:15,839][00497] Updated weights for policy 0, policy_version 59998 (0.0022) [2024-03-29 18:42:18,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 983121920. Throughput: 0: 41378.4. Samples: 865274600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 18:42:18,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 18:42:18,858][00476] Signal inference workers to stop experience collection... (30750 times) [2024-03-29 18:42:18,911][00497] InferenceWorker_p0-w0: stopping experience collection (30750 times) [2024-03-29 18:42:18,947][00476] Signal inference workers to resume experience collection... (30750 times) [2024-03-29 18:42:18,949][00497] InferenceWorker_p0-w0: resuming experience collection (30750 times) [2024-03-29 18:42:19,795][00497] Updated weights for policy 0, policy_version 60008 (0.0035) [2024-03-29 18:42:23,826][00497] Updated weights for policy 0, policy_version 60018 (0.0028) [2024-03-29 18:42:23,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 983334912. Throughput: 0: 41300.1. Samples: 865526780. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:23,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 18:42:27,711][00497] Updated weights for policy 0, policy_version 60028 (0.0022) [2024-03-29 18:42:28,839][00126] Fps is (10 sec: 39320.8, 60 sec: 40686.8, 300 sec: 41487.6). Total num frames: 983515136. Throughput: 0: 41585.8. Samples: 865782740. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:28,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:42:31,712][00497] Updated weights for policy 0, policy_version 60038 (0.0025) [2024-03-29 18:42:33,839][00126] Fps is (10 sec: 44235.6, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 983777280. Throughput: 0: 41633.2. Samples: 865890400. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:33,841][00126] Avg episode reward: [(0, '0.643')] [2024-03-29 18:42:35,599][00497] Updated weights for policy 0, policy_version 60048 (0.0017) [2024-03-29 18:42:38,839][00126] Fps is (10 sec: 44237.8, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 983957504. Throughput: 0: 41290.7. Samples: 866140420. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:38,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 18:42:39,740][00497] Updated weights for policy 0, policy_version 60058 (0.0022) [2024-03-29 18:42:43,560][00497] Updated weights for policy 0, policy_version 60068 (0.0026) [2024-03-29 18:42:43,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41543.1). Total num frames: 984154112. Throughput: 0: 41156.9. Samples: 866397080. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:43,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 18:42:47,493][00497] Updated weights for policy 0, policy_version 60078 (0.0020) [2024-03-29 18:42:48,135][00476] Signal inference workers to stop experience collection... (30800 times) [2024-03-29 18:42:48,175][00497] InferenceWorker_p0-w0: stopping experience collection (30800 times) [2024-03-29 18:42:48,367][00476] Signal inference workers to resume experience collection... (30800 times) [2024-03-29 18:42:48,367][00497] InferenceWorker_p0-w0: resuming experience collection (30800 times) [2024-03-29 18:42:48,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 984383488. Throughput: 0: 41771.4. Samples: 866527740. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:48,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 18:42:51,332][00497] Updated weights for policy 0, policy_version 60088 (0.0023) [2024-03-29 18:42:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 984563712. Throughput: 0: 41082.8. Samples: 866762780. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 18:42:53,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 18:42:55,517][00497] Updated weights for policy 0, policy_version 60098 (0.0019) [2024-03-29 18:42:58,839][00126] Fps is (10 sec: 39322.5, 60 sec: 41506.2, 300 sec: 41487.6). Total num frames: 984776704. Throughput: 0: 40903.6. Samples: 867020540. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:42:58,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 18:42:59,389][00497] Updated weights for policy 0, policy_version 60108 (0.0018) [2024-03-29 18:43:03,233][00497] Updated weights for policy 0, policy_version 60118 (0.0028) [2024-03-29 18:43:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40960.1, 300 sec: 41432.1). Total num frames: 984973312. Throughput: 0: 41832.0. Samples: 867157040. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:03,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 18:43:04,157][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000060120_985006080.pth... [2024-03-29 18:43:04,496][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000059514_975077376.pth [2024-03-29 18:43:07,359][00497] Updated weights for policy 0, policy_version 60128 (0.0027) [2024-03-29 18:43:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 985202688. Throughput: 0: 41000.8. Samples: 867371820. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:08,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:43:11,611][00497] Updated weights for policy 0, policy_version 60138 (0.0020) [2024-03-29 18:43:13,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41376.6). Total num frames: 985382912. Throughput: 0: 40778.4. Samples: 867617760. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:13,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:43:15,872][00497] Updated weights for policy 0, policy_version 60148 (0.0019) [2024-03-29 18:43:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41487.7). Total num frames: 985612288. Throughput: 0: 41490.0. Samples: 867757440. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:18,840][00126] Avg episode reward: [(0, '0.512')] [2024-03-29 18:43:19,421][00497] Updated weights for policy 0, policy_version 60158 (0.0033) [2024-03-29 18:43:23,396][00497] Updated weights for policy 0, policy_version 60168 (0.0023) [2024-03-29 18:43:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41232.9, 300 sec: 41432.1). Total num frames: 985808896. Throughput: 0: 40878.5. Samples: 867979960. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:23,841][00126] Avg episode reward: [(0, '0.648')] [2024-03-29 18:43:24,201][00476] Signal inference workers to stop experience collection... (30850 times) [2024-03-29 18:43:24,235][00497] InferenceWorker_p0-w0: stopping experience collection (30850 times) [2024-03-29 18:43:24,382][00476] Signal inference workers to resume experience collection... (30850 times) [2024-03-29 18:43:24,383][00497] InferenceWorker_p0-w0: resuming experience collection (30850 times) [2024-03-29 18:43:27,328][00497] Updated weights for policy 0, policy_version 60178 (0.0025) [2024-03-29 18:43:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.3, 300 sec: 41432.1). Total num frames: 986021888. Throughput: 0: 41156.4. Samples: 868249120. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:28,840][00126] Avg episode reward: [(0, '0.503')] [2024-03-29 18:43:31,596][00497] Updated weights for policy 0, policy_version 60188 (0.0031) [2024-03-29 18:43:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40687.0, 300 sec: 41376.5). Total num frames: 986218496. Throughput: 0: 41161.4. Samples: 868380000. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 18:43:33,842][00126] Avg episode reward: [(0, '0.639')] [2024-03-29 18:43:35,128][00497] Updated weights for policy 0, policy_version 60198 (0.0035) [2024-03-29 18:43:38,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.0, 300 sec: 41376.6). Total num frames: 986431488. Throughput: 0: 40896.4. Samples: 868603120. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:43:38,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:43:39,193][00497] Updated weights for policy 0, policy_version 60208 (0.0019) [2024-03-29 18:43:43,236][00497] Updated weights for policy 0, policy_version 60218 (0.0025) [2024-03-29 18:43:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.0, 300 sec: 41376.5). Total num frames: 986628096. Throughput: 0: 40899.8. Samples: 868861040. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:43:43,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:43:47,522][00497] Updated weights for policy 0, policy_version 60228 (0.0024) [2024-03-29 18:43:48,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40414.0, 300 sec: 41210.0). Total num frames: 986808320. Throughput: 0: 40915.5. Samples: 868998240. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:43:48,840][00126] Avg episode reward: [(0, '0.642')] [2024-03-29 18:43:50,936][00497] Updated weights for policy 0, policy_version 60238 (0.0027) [2024-03-29 18:43:53,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.1, 300 sec: 41376.6). Total num frames: 987054080. Throughput: 0: 41251.6. Samples: 869228140. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:43:53,840][00126] Avg episode reward: [(0, '0.527')] [2024-03-29 18:43:54,834][00497] Updated weights for policy 0, policy_version 60248 (0.0023) [2024-03-29 18:43:54,854][00476] Signal inference workers to stop experience collection... (30900 times) [2024-03-29 18:43:54,854][00476] Signal inference workers to resume experience collection... (30900 times) [2024-03-29 18:43:54,878][00497] InferenceWorker_p0-w0: stopping experience collection (30900 times) [2024-03-29 18:43:54,878][00497] InferenceWorker_p0-w0: resuming experience collection (30900 times) [2024-03-29 18:43:58,839][00126] Fps is (10 sec: 44236.0, 60 sec: 41232.9, 300 sec: 41321.0). Total num frames: 987250688. Throughput: 0: 41580.3. Samples: 869488880. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:43:58,841][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 18:43:58,863][00497] Updated weights for policy 0, policy_version 60258 (0.0030) [2024-03-29 18:44:03,338][00497] Updated weights for policy 0, policy_version 60268 (0.0028) [2024-03-29 18:44:03,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40960.0, 300 sec: 41154.4). Total num frames: 987430912. Throughput: 0: 41328.0. Samples: 869617200. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:44:03,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 18:44:06,621][00497] Updated weights for policy 0, policy_version 60278 (0.0030) [2024-03-29 18:44:08,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 987693056. Throughput: 0: 41892.1. Samples: 869865100. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 18:44:08,841][00126] Avg episode reward: [(0, '0.618')] [2024-03-29 18:44:10,789][00497] Updated weights for policy 0, policy_version 60288 (0.0021) [2024-03-29 18:44:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 987873280. Throughput: 0: 41255.2. Samples: 870105600. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:13,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:44:14,929][00497] Updated weights for policy 0, policy_version 60298 (0.0021) [2024-03-29 18:44:18,839][00126] Fps is (10 sec: 37682.6, 60 sec: 40959.9, 300 sec: 41209.9). Total num frames: 988069888. Throughput: 0: 41156.4. Samples: 870232040. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:18,840][00126] Avg episode reward: [(0, '0.684')] [2024-03-29 18:44:19,611][00497] Updated weights for policy 0, policy_version 60308 (0.0024) [2024-03-29 18:44:22,543][00497] Updated weights for policy 0, policy_version 60318 (0.0027) [2024-03-29 18:44:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41265.5). Total num frames: 988282880. Throughput: 0: 41445.3. Samples: 870468160. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:23,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:44:26,714][00497] Updated weights for policy 0, policy_version 60328 (0.0024) [2024-03-29 18:44:27,583][00476] Signal inference workers to stop experience collection... (30950 times) [2024-03-29 18:44:27,663][00497] InferenceWorker_p0-w0: stopping experience collection (30950 times) [2024-03-29 18:44:27,664][00476] Signal inference workers to resume experience collection... (30950 times) [2024-03-29 18:44:27,689][00497] InferenceWorker_p0-w0: resuming experience collection (30950 times) [2024-03-29 18:44:28,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 988495872. Throughput: 0: 41240.5. Samples: 870716860. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:28,840][00126] Avg episode reward: [(0, '0.653')] [2024-03-29 18:44:30,839][00497] Updated weights for policy 0, policy_version 60338 (0.0023) [2024-03-29 18:44:33,839][00126] Fps is (10 sec: 37682.9, 60 sec: 40687.0, 300 sec: 41098.8). Total num frames: 988659712. Throughput: 0: 41058.1. Samples: 870845860. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:33,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 18:44:35,308][00497] Updated weights for policy 0, policy_version 60348 (0.0029) [2024-03-29 18:44:38,409][00497] Updated weights for policy 0, policy_version 60358 (0.0034) [2024-03-29 18:44:38,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41321.0). Total num frames: 988905472. Throughput: 0: 41447.1. Samples: 871093260. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:38,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 18:44:42,743][00497] Updated weights for policy 0, policy_version 60368 (0.0024) [2024-03-29 18:44:43,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41506.2, 300 sec: 41321.0). Total num frames: 989118464. Throughput: 0: 41132.5. Samples: 871339840. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:43,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 18:44:46,700][00497] Updated weights for policy 0, policy_version 60378 (0.0025) [2024-03-29 18:44:48,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41209.9). Total num frames: 989315072. Throughput: 0: 40989.7. Samples: 871461740. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 18:44:48,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:44:51,218][00497] Updated weights for policy 0, policy_version 60388 (0.0022) [2024-03-29 18:44:53,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41233.1, 300 sec: 41265.5). Total num frames: 989528064. Throughput: 0: 41351.6. Samples: 871725920. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:44:53,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:44:54,230][00497] Updated weights for policy 0, policy_version 60398 (0.0038) [2024-03-29 18:44:56,185][00476] Signal inference workers to stop experience collection... (31000 times) [2024-03-29 18:44:56,185][00476] Signal inference workers to resume experience collection... (31000 times) [2024-03-29 18:44:56,283][00497] InferenceWorker_p0-w0: stopping experience collection (31000 times) [2024-03-29 18:44:56,284][00497] InferenceWorker_p0-w0: resuming experience collection (31000 times) [2024-03-29 18:44:58,506][00497] Updated weights for policy 0, policy_version 60408 (0.0023) [2024-03-29 18:44:58,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41265.5). Total num frames: 989741056. Throughput: 0: 41315.6. Samples: 871964800. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:44:58,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:45:02,380][00497] Updated weights for policy 0, policy_version 60418 (0.0024) [2024-03-29 18:45:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41265.4). Total num frames: 989954048. Throughput: 0: 41137.0. Samples: 872083200. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:45:03,840][00126] Avg episode reward: [(0, '0.639')] [2024-03-29 18:45:03,990][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000060423_989970432.pth... [2024-03-29 18:45:04,302][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000059818_980058112.pth [2024-03-29 18:45:06,782][00497] Updated weights for policy 0, policy_version 60428 (0.0022) [2024-03-29 18:45:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40960.0, 300 sec: 41265.5). Total num frames: 990150656. Throughput: 0: 41847.5. Samples: 872351300. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:45:08,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 18:45:09,987][00497] Updated weights for policy 0, policy_version 60438 (0.0027) [2024-03-29 18:45:13,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41265.5). Total num frames: 990363648. Throughput: 0: 41503.5. Samples: 872584520. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:45:13,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 18:45:14,214][00497] Updated weights for policy 0, policy_version 60448 (0.0028) [2024-03-29 18:45:18,127][00497] Updated weights for policy 0, policy_version 60458 (0.0023) [2024-03-29 18:45:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41376.5). Total num frames: 990576640. Throughput: 0: 41419.7. Samples: 872709740. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:45:18,840][00126] Avg episode reward: [(0, '0.634')] [2024-03-29 18:45:22,457][00497] Updated weights for policy 0, policy_version 60468 (0.0023) [2024-03-29 18:45:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41321.0). Total num frames: 990773248. Throughput: 0: 41946.1. Samples: 872980840. Policy #0 lag: (min: 2.0, avg: 19.1, max: 42.0) [2024-03-29 18:45:23,840][00126] Avg episode reward: [(0, '0.618')] [2024-03-29 18:45:25,500][00497] Updated weights for policy 0, policy_version 60478 (0.0025) [2024-03-29 18:45:26,045][00476] Signal inference workers to stop experience collection... (31050 times) [2024-03-29 18:45:26,124][00476] Signal inference workers to resume experience collection... (31050 times) [2024-03-29 18:45:26,127][00497] InferenceWorker_p0-w0: stopping experience collection (31050 times) [2024-03-29 18:45:26,155][00497] InferenceWorker_p0-w0: resuming experience collection (31050 times) [2024-03-29 18:45:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41376.6). Total num frames: 991002624. Throughput: 0: 41710.4. Samples: 873216800. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:28,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 18:45:29,707][00497] Updated weights for policy 0, policy_version 60488 (0.0029) [2024-03-29 18:45:33,631][00497] Updated weights for policy 0, policy_version 60498 (0.0027) [2024-03-29 18:45:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.4, 300 sec: 41376.6). Total num frames: 991199232. Throughput: 0: 41880.6. Samples: 873346360. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:33,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:45:38,077][00497] Updated weights for policy 0, policy_version 60508 (0.0024) [2024-03-29 18:45:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 41265.5). Total num frames: 991395840. Throughput: 0: 42029.8. Samples: 873617260. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:38,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:45:41,086][00497] Updated weights for policy 0, policy_version 60518 (0.0024) [2024-03-29 18:45:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41432.1). Total num frames: 991641600. Throughput: 0: 41929.4. Samples: 873851620. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:43,840][00126] Avg episode reward: [(0, '0.553')] [2024-03-29 18:45:45,339][00497] Updated weights for policy 0, policy_version 60528 (0.0032) [2024-03-29 18:45:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 991821824. Throughput: 0: 42136.0. Samples: 873979320. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:48,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 18:45:49,175][00497] Updated weights for policy 0, policy_version 60538 (0.0021) [2024-03-29 18:45:53,767][00497] Updated weights for policy 0, policy_version 60548 (0.0021) [2024-03-29 18:45:53,839][00126] Fps is (10 sec: 37682.6, 60 sec: 41506.0, 300 sec: 41265.4). Total num frames: 992018432. Throughput: 0: 41955.9. Samples: 874239320. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:53,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 18:45:55,482][00476] Signal inference workers to stop experience collection... (31100 times) [2024-03-29 18:45:55,482][00476] Signal inference workers to resume experience collection... (31100 times) [2024-03-29 18:45:55,522][00497] InferenceWorker_p0-w0: stopping experience collection (31100 times) [2024-03-29 18:45:55,522][00497] InferenceWorker_p0-w0: resuming experience collection (31100 times) [2024-03-29 18:45:56,713][00497] Updated weights for policy 0, policy_version 60558 (0.0040) [2024-03-29 18:45:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41432.1). Total num frames: 992264192. Throughput: 0: 42069.3. Samples: 874477640. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:45:58,840][00126] Avg episode reward: [(0, '0.539')] [2024-03-29 18:46:00,746][00497] Updated weights for policy 0, policy_version 60568 (0.0019) [2024-03-29 18:46:03,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.2, 300 sec: 41543.1). Total num frames: 992477184. Throughput: 0: 42329.6. Samples: 874614580. Policy #0 lag: (min: 2.0, avg: 22.7, max: 44.0) [2024-03-29 18:46:03,841][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 18:46:04,614][00497] Updated weights for policy 0, policy_version 60578 (0.0026) [2024-03-29 18:46:08,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 41265.5). Total num frames: 992657408. Throughput: 0: 42127.7. Samples: 874876580. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:08,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 18:46:09,022][00497] Updated weights for policy 0, policy_version 60588 (0.0024) [2024-03-29 18:46:12,272][00497] Updated weights for policy 0, policy_version 60598 (0.0040) [2024-03-29 18:46:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41543.2). Total num frames: 992903168. Throughput: 0: 42220.4. Samples: 875116720. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:13,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 18:46:16,271][00497] Updated weights for policy 0, policy_version 60608 (0.0021) [2024-03-29 18:46:18,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.3, 300 sec: 41543.2). Total num frames: 993116160. Throughput: 0: 42328.9. Samples: 875251160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:18,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 18:46:20,145][00497] Updated weights for policy 0, policy_version 60618 (0.0038) [2024-03-29 18:46:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 41432.1). Total num frames: 993296384. Throughput: 0: 41921.7. Samples: 875503740. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:23,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:46:24,669][00497] Updated weights for policy 0, policy_version 60628 (0.0027) [2024-03-29 18:46:26,994][00476] Signal inference workers to stop experience collection... (31150 times) [2024-03-29 18:46:26,994][00476] Signal inference workers to resume experience collection... (31150 times) [2024-03-29 18:46:27,040][00497] InferenceWorker_p0-w0: stopping experience collection (31150 times) [2024-03-29 18:46:27,040][00497] InferenceWorker_p0-w0: resuming experience collection (31150 times) [2024-03-29 18:46:27,660][00497] Updated weights for policy 0, policy_version 60638 (0.0028) [2024-03-29 18:46:28,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.2, 300 sec: 41543.2). Total num frames: 993542144. Throughput: 0: 42238.5. Samples: 875752360. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:28,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:46:31,585][00497] Updated weights for policy 0, policy_version 60648 (0.0025) [2024-03-29 18:46:33,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42598.4, 300 sec: 41654.3). Total num frames: 993755136. Throughput: 0: 42394.3. Samples: 875887060. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:33,841][00126] Avg episode reward: [(0, '0.570')] [2024-03-29 18:46:35,524][00497] Updated weights for policy 0, policy_version 60658 (0.0022) [2024-03-29 18:46:38,839][00126] Fps is (10 sec: 37683.8, 60 sec: 42052.3, 300 sec: 41487.6). Total num frames: 993918976. Throughput: 0: 42133.5. Samples: 876135320. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 18:46:38,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 18:46:39,942][00497] Updated weights for policy 0, policy_version 60668 (0.0025) [2024-03-29 18:46:43,308][00497] Updated weights for policy 0, policy_version 60678 (0.0028) [2024-03-29 18:46:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 994164736. Throughput: 0: 42327.7. Samples: 876382380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:46:43,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:46:47,269][00497] Updated weights for policy 0, policy_version 60688 (0.0025) [2024-03-29 18:46:48,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 41654.2). Total num frames: 994377728. Throughput: 0: 42122.4. Samples: 876510080. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:46:48,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:46:51,306][00497] Updated weights for policy 0, policy_version 60698 (0.0024) [2024-03-29 18:46:53,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.5, 300 sec: 41654.2). Total num frames: 994574336. Throughput: 0: 41953.7. Samples: 876764500. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:46:53,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:46:55,828][00497] Updated weights for policy 0, policy_version 60708 (0.0024) [2024-03-29 18:46:58,313][00476] Signal inference workers to stop experience collection... (31200 times) [2024-03-29 18:46:58,390][00497] InferenceWorker_p0-w0: stopping experience collection (31200 times) [2024-03-29 18:46:58,403][00476] Signal inference workers to resume experience collection... (31200 times) [2024-03-29 18:46:58,426][00497] InferenceWorker_p0-w0: resuming experience collection (31200 times) [2024-03-29 18:46:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.4, 300 sec: 41598.7). Total num frames: 994787328. Throughput: 0: 41888.1. Samples: 877001680. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:46:58,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:46:59,037][00497] Updated weights for policy 0, policy_version 60718 (0.0027) [2024-03-29 18:47:02,877][00497] Updated weights for policy 0, policy_version 60728 (0.0028) [2024-03-29 18:47:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 995000320. Throughput: 0: 41931.9. Samples: 877138100. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:47:03,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:47:04,160][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000060732_995033088.pth... [2024-03-29 18:47:04,520][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000060120_985006080.pth [2024-03-29 18:47:06,826][00497] Updated weights for policy 0, policy_version 60738 (0.0029) [2024-03-29 18:47:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 41765.3). Total num frames: 995213312. Throughput: 0: 41976.0. Samples: 877392660. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:47:08,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 18:47:11,469][00497] Updated weights for policy 0, policy_version 60748 (0.0030) [2024-03-29 18:47:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 995409920. Throughput: 0: 41946.0. Samples: 877639920. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:47:13,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 18:47:14,519][00497] Updated weights for policy 0, policy_version 60758 (0.0029) [2024-03-29 18:47:18,321][00497] Updated weights for policy 0, policy_version 60768 (0.0020) [2024-03-29 18:47:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 995639296. Throughput: 0: 41878.2. Samples: 877771580. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 18:47:18,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 18:47:22,350][00497] Updated weights for policy 0, policy_version 60778 (0.0031) [2024-03-29 18:47:23,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42598.4, 300 sec: 41820.9). Total num frames: 995852288. Throughput: 0: 42249.7. Samples: 878036560. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:23,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 18:47:26,921][00497] Updated weights for policy 0, policy_version 60788 (0.0025) [2024-03-29 18:47:28,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.4, 300 sec: 41654.3). Total num frames: 996065280. Throughput: 0: 42154.7. Samples: 878279340. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:28,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 18:47:29,935][00497] Updated weights for policy 0, policy_version 60798 (0.0030) [2024-03-29 18:47:30,310][00476] Signal inference workers to stop experience collection... (31250 times) [2024-03-29 18:47:30,386][00476] Signal inference workers to resume experience collection... (31250 times) [2024-03-29 18:47:30,389][00497] InferenceWorker_p0-w0: stopping experience collection (31250 times) [2024-03-29 18:47:30,412][00497] InferenceWorker_p0-w0: resuming experience collection (31250 times) [2024-03-29 18:47:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 996261888. Throughput: 0: 41938.7. Samples: 878397320. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:33,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 18:47:33,958][00497] Updated weights for policy 0, policy_version 60808 (0.0022) [2024-03-29 18:47:38,145][00497] Updated weights for policy 0, policy_version 60818 (0.0021) [2024-03-29 18:47:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 996474880. Throughput: 0: 41880.9. Samples: 878649140. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:38,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:47:42,851][00497] Updated weights for policy 0, policy_version 60828 (0.0023) [2024-03-29 18:47:43,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 996655104. Throughput: 0: 42381.3. Samples: 878908840. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:43,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 18:47:45,764][00497] Updated weights for policy 0, policy_version 60838 (0.0022) [2024-03-29 18:47:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 996868096. Throughput: 0: 41707.2. Samples: 879014920. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:48,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 18:47:49,893][00497] Updated weights for policy 0, policy_version 60848 (0.0019) [2024-03-29 18:47:53,758][00497] Updated weights for policy 0, policy_version 60858 (0.0019) [2024-03-29 18:47:53,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 997097472. Throughput: 0: 41818.7. Samples: 879274500. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 18:47:53,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:47:58,615][00497] Updated weights for policy 0, policy_version 60868 (0.0022) [2024-03-29 18:47:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 997261312. Throughput: 0: 42274.2. Samples: 879542260. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:47:58,840][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 18:48:00,669][00476] Signal inference workers to stop experience collection... (31300 times) [2024-03-29 18:48:00,686][00497] InferenceWorker_p0-w0: stopping experience collection (31300 times) [2024-03-29 18:48:00,884][00476] Signal inference workers to resume experience collection... (31300 times) [2024-03-29 18:48:00,885][00497] InferenceWorker_p0-w0: resuming experience collection (31300 times) [2024-03-29 18:48:01,666][00497] Updated weights for policy 0, policy_version 60878 (0.0029) [2024-03-29 18:48:03,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 997490688. Throughput: 0: 41334.5. Samples: 879631640. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:03,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 18:48:05,657][00497] Updated weights for policy 0, policy_version 60888 (0.0020) [2024-03-29 18:48:08,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 997703680. Throughput: 0: 41168.9. Samples: 879889160. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:08,842][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 18:48:09,718][00497] Updated weights for policy 0, policy_version 60898 (0.0027) [2024-03-29 18:48:13,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 997883904. Throughput: 0: 41555.9. Samples: 880149360. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:13,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 18:48:14,588][00497] Updated weights for policy 0, policy_version 60908 (0.0026) [2024-03-29 18:48:17,596][00497] Updated weights for policy 0, policy_version 60918 (0.0029) [2024-03-29 18:48:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 998129664. Throughput: 0: 41646.7. Samples: 880271420. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:18,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:48:21,392][00497] Updated weights for policy 0, policy_version 60928 (0.0018) [2024-03-29 18:48:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 998326272. Throughput: 0: 41614.7. Samples: 880521800. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:23,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 18:48:25,379][00497] Updated weights for policy 0, policy_version 60938 (0.0020) [2024-03-29 18:48:28,839][00126] Fps is (10 sec: 39321.1, 60 sec: 40959.9, 300 sec: 41709.8). Total num frames: 998522880. Throughput: 0: 41483.9. Samples: 880775620. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:28,840][00126] Avg episode reward: [(0, '0.577')] [2024-03-29 18:48:30,144][00497] Updated weights for policy 0, policy_version 60948 (0.0022) [2024-03-29 18:48:31,879][00476] Signal inference workers to stop experience collection... (31350 times) [2024-03-29 18:48:31,914][00497] InferenceWorker_p0-w0: stopping experience collection (31350 times) [2024-03-29 18:48:32,096][00476] Signal inference workers to resume experience collection... (31350 times) [2024-03-29 18:48:32,097][00497] InferenceWorker_p0-w0: resuming experience collection (31350 times) [2024-03-29 18:48:32,964][00497] Updated weights for policy 0, policy_version 60958 (0.0021) [2024-03-29 18:48:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 998768640. Throughput: 0: 41989.3. Samples: 880904440. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 18:48:33,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 18:48:37,181][00497] Updated weights for policy 0, policy_version 60968 (0.0019) [2024-03-29 18:48:38,839][00126] Fps is (10 sec: 45876.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 998981632. Throughput: 0: 41527.7. Samples: 881143240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:48:38,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 18:48:41,188][00497] Updated weights for policy 0, policy_version 60978 (0.0019) [2024-03-29 18:48:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 999161856. Throughput: 0: 41126.1. Samples: 881392940. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:48:43,841][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 18:48:45,866][00497] Updated weights for policy 0, policy_version 60988 (0.0022) [2024-03-29 18:48:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 999374848. Throughput: 0: 42302.0. Samples: 881535220. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:48:48,840][00126] Avg episode reward: [(0, '0.659')] [2024-03-29 18:48:48,983][00497] Updated weights for policy 0, policy_version 60998 (0.0027) [2024-03-29 18:48:52,938][00497] Updated weights for policy 0, policy_version 61008 (0.0022) [2024-03-29 18:48:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 999587840. Throughput: 0: 41713.0. Samples: 881766240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:48:53,840][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 18:48:56,746][00497] Updated weights for policy 0, policy_version 61018 (0.0023) [2024-03-29 18:48:58,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 41931.9). Total num frames: 999800832. Throughput: 0: 41566.6. Samples: 882019860. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:48:58,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:49:01,384][00497] Updated weights for policy 0, policy_version 61028 (0.0019) [2024-03-29 18:49:03,219][00476] Signal inference workers to stop experience collection... (31400 times) [2024-03-29 18:49:03,259][00497] InferenceWorker_p0-w0: stopping experience collection (31400 times) [2024-03-29 18:49:03,448][00476] Signal inference workers to resume experience collection... (31400 times) [2024-03-29 18:49:03,449][00497] InferenceWorker_p0-w0: resuming experience collection (31400 times) [2024-03-29 18:49:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 1000013824. Throughput: 0: 42130.6. Samples: 882167300. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:49:03,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 18:49:03,996][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061037_1000030208.pth... [2024-03-29 18:49:04,295][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000060423_989970432.pth [2024-03-29 18:49:04,635][00497] Updated weights for policy 0, policy_version 61038 (0.0028) [2024-03-29 18:49:08,530][00497] Updated weights for policy 0, policy_version 61048 (0.0019) [2024-03-29 18:49:08,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 1000210432. Throughput: 0: 41601.3. Samples: 882393860. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:49:08,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:49:12,344][00497] Updated weights for policy 0, policy_version 61058 (0.0031) [2024-03-29 18:49:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 1000423424. Throughput: 0: 41570.7. Samples: 882646300. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 18:49:13,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 18:49:17,015][00497] Updated weights for policy 0, policy_version 61068 (0.0026) [2024-03-29 18:49:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 1000620032. Throughput: 0: 41968.5. Samples: 882793020. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:18,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 18:49:20,087][00497] Updated weights for policy 0, policy_version 61078 (0.0022) [2024-03-29 18:49:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.1, 300 sec: 41876.4). Total num frames: 1000849408. Throughput: 0: 41974.4. Samples: 883032100. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:23,840][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 18:49:24,144][00497] Updated weights for policy 0, policy_version 61088 (0.0021) [2024-03-29 18:49:27,712][00497] Updated weights for policy 0, policy_version 61098 (0.0023) [2024-03-29 18:49:28,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 1001078784. Throughput: 0: 42082.7. Samples: 883286660. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:28,840][00126] Avg episode reward: [(0, '0.632')] [2024-03-29 18:49:32,426][00497] Updated weights for policy 0, policy_version 61108 (0.0018) [2024-03-29 18:49:33,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 1001259008. Throughput: 0: 42012.0. Samples: 883425760. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:33,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 18:49:35,501][00497] Updated weights for policy 0, policy_version 61118 (0.0030) [2024-03-29 18:49:35,822][00476] Signal inference workers to stop experience collection... (31450 times) [2024-03-29 18:49:35,861][00497] InferenceWorker_p0-w0: stopping experience collection (31450 times) [2024-03-29 18:49:36,021][00476] Signal inference workers to resume experience collection... (31450 times) [2024-03-29 18:49:36,021][00497] InferenceWorker_p0-w0: resuming experience collection (31450 times) [2024-03-29 18:49:38,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 1001472000. Throughput: 0: 42024.9. Samples: 883657360. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:38,840][00126] Avg episode reward: [(0, '0.506')] [2024-03-29 18:49:39,744][00497] Updated weights for policy 0, policy_version 61128 (0.0023) [2024-03-29 18:49:43,491][00497] Updated weights for policy 0, policy_version 61138 (0.0029) [2024-03-29 18:49:43,839][00126] Fps is (10 sec: 42597.3, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 1001684992. Throughput: 0: 41922.6. Samples: 883906380. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:43,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:49:48,292][00497] Updated weights for policy 0, policy_version 61148 (0.0018) [2024-03-29 18:49:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 1001865216. Throughput: 0: 41468.9. Samples: 884033400. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 18:49:48,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 18:49:51,744][00497] Updated weights for policy 0, policy_version 61158 (0.0027) [2024-03-29 18:49:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 1002094592. Throughput: 0: 41712.8. Samples: 884270940. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:49:53,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:49:55,679][00497] Updated weights for policy 0, policy_version 61168 (0.0023) [2024-03-29 18:49:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.3, 300 sec: 41820.9). Total num frames: 1002291200. Throughput: 0: 41890.4. Samples: 884531360. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:49:58,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 18:49:59,460][00497] Updated weights for policy 0, policy_version 61178 (0.0026) [2024-03-29 18:50:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.0, 300 sec: 41820.8). Total num frames: 1002487808. Throughput: 0: 41348.2. Samples: 884653700. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:50:03,841][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:50:04,074][00497] Updated weights for policy 0, policy_version 61188 (0.0018) [2024-03-29 18:50:07,312][00497] Updated weights for policy 0, policy_version 61198 (0.0021) [2024-03-29 18:50:08,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 1002733568. Throughput: 0: 41464.5. Samples: 884898000. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:50:08,840][00126] Avg episode reward: [(0, '0.632')] [2024-03-29 18:50:10,325][00476] Signal inference workers to stop experience collection... (31500 times) [2024-03-29 18:50:10,365][00497] InferenceWorker_p0-w0: stopping experience collection (31500 times) [2024-03-29 18:50:10,551][00476] Signal inference workers to resume experience collection... (31500 times) [2024-03-29 18:50:10,585][00497] InferenceWorker_p0-w0: resuming experience collection (31500 times) [2024-03-29 18:50:11,297][00497] Updated weights for policy 0, policy_version 61208 (0.0025) [2024-03-29 18:50:13,839][00126] Fps is (10 sec: 42599.3, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 1002913792. Throughput: 0: 41387.1. Samples: 885149080. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:50:13,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 18:50:15,134][00497] Updated weights for policy 0, policy_version 61218 (0.0024) [2024-03-29 18:50:18,839][00126] Fps is (10 sec: 37683.0, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 1003110400. Throughput: 0: 41128.7. Samples: 885276560. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:50:18,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:50:19,709][00497] Updated weights for policy 0, policy_version 61228 (0.0029) [2024-03-29 18:50:22,802][00497] Updated weights for policy 0, policy_version 61238 (0.0019) [2024-03-29 18:50:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 1003356160. Throughput: 0: 41727.5. Samples: 885535100. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:50:23,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 18:50:27,103][00497] Updated weights for policy 0, policy_version 61248 (0.0027) [2024-03-29 18:50:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 1003569152. Throughput: 0: 41789.4. Samples: 885786900. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 18:50:28,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 18:50:30,724][00497] Updated weights for policy 0, policy_version 61258 (0.0027) [2024-03-29 18:50:33,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 1003749376. Throughput: 0: 41616.1. Samples: 885906120. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:50:33,841][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 18:50:35,466][00497] Updated weights for policy 0, policy_version 61268 (0.0018) [2024-03-29 18:50:38,527][00497] Updated weights for policy 0, policy_version 61278 (0.0026) [2024-03-29 18:50:38,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 1003995136. Throughput: 0: 42272.6. Samples: 886173200. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:50:38,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:50:42,883][00497] Updated weights for policy 0, policy_version 61288 (0.0021) [2024-03-29 18:50:43,279][00476] Signal inference workers to stop experience collection... (31550 times) [2024-03-29 18:50:43,337][00497] InferenceWorker_p0-w0: stopping experience collection (31550 times) [2024-03-29 18:50:43,445][00476] Signal inference workers to resume experience collection... (31550 times) [2024-03-29 18:50:43,445][00497] InferenceWorker_p0-w0: resuming experience collection (31550 times) [2024-03-29 18:50:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.4, 300 sec: 41931.9). Total num frames: 1004191744. Throughput: 0: 41848.8. Samples: 886414560. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:50:43,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 18:50:46,622][00497] Updated weights for policy 0, policy_version 61298 (0.0023) [2024-03-29 18:50:48,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 1004371968. Throughput: 0: 41779.6. Samples: 886533780. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:50:48,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 18:50:51,295][00497] Updated weights for policy 0, policy_version 61308 (0.0020) [2024-03-29 18:50:53,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 1004601344. Throughput: 0: 41997.5. Samples: 886787880. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:50:53,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:50:54,362][00497] Updated weights for policy 0, policy_version 61318 (0.0026) [2024-03-29 18:50:58,737][00497] Updated weights for policy 0, policy_version 61328 (0.0022) [2024-03-29 18:50:58,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 1004797952. Throughput: 0: 41742.2. Samples: 887027480. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:50:58,840][00126] Avg episode reward: [(0, '0.564')] [2024-03-29 18:51:02,371][00497] Updated weights for policy 0, policy_version 61338 (0.0028) [2024-03-29 18:51:03,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 1005010944. Throughput: 0: 41788.5. Samples: 887157040. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 18:51:03,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 18:51:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061341_1005010944.pth... [2024-03-29 18:51:04,172][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000060732_995033088.pth [2024-03-29 18:51:06,849][00497] Updated weights for policy 0, policy_version 61348 (0.0023) [2024-03-29 18:51:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1005223936. Throughput: 0: 41899.9. Samples: 887420600. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:08,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 18:51:09,905][00497] Updated weights for policy 0, policy_version 61358 (0.0023) [2024-03-29 18:51:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 1005436928. Throughput: 0: 41614.4. Samples: 887659540. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:13,840][00126] Avg episode reward: [(0, '0.662')] [2024-03-29 18:51:14,187][00497] Updated weights for policy 0, policy_version 61368 (0.0023) [2024-03-29 18:51:15,307][00476] Signal inference workers to stop experience collection... (31600 times) [2024-03-29 18:51:15,308][00476] Signal inference workers to resume experience collection... (31600 times) [2024-03-29 18:51:15,353][00497] InferenceWorker_p0-w0: stopping experience collection (31600 times) [2024-03-29 18:51:15,354][00497] InferenceWorker_p0-w0: resuming experience collection (31600 times) [2024-03-29 18:51:17,891][00497] Updated weights for policy 0, policy_version 61378 (0.0026) [2024-03-29 18:51:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 1005649920. Throughput: 0: 41686.1. Samples: 887782000. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:18,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:51:22,556][00497] Updated weights for policy 0, policy_version 61388 (0.0024) [2024-03-29 18:51:23,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 1005846528. Throughput: 0: 41794.1. Samples: 888053940. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:23,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 18:51:25,518][00497] Updated weights for policy 0, policy_version 61398 (0.0026) [2024-03-29 18:51:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1006075904. Throughput: 0: 41758.5. Samples: 888293700. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:28,841][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 18:51:29,767][00497] Updated weights for policy 0, policy_version 61408 (0.0024) [2024-03-29 18:51:33,615][00497] Updated weights for policy 0, policy_version 61418 (0.0026) [2024-03-29 18:51:33,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 1006272512. Throughput: 0: 41978.8. Samples: 888422820. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:33,840][00126] Avg episode reward: [(0, '0.599')] [2024-03-29 18:51:38,096][00497] Updated weights for policy 0, policy_version 61428 (0.0024) [2024-03-29 18:51:38,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 1006469120. Throughput: 0: 42112.7. Samples: 888682960. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:38,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:51:41,126][00497] Updated weights for policy 0, policy_version 61438 (0.0020) [2024-03-29 18:51:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1006698496. Throughput: 0: 42028.1. Samples: 888918740. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:51:43,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 18:51:45,461][00497] Updated weights for policy 0, policy_version 61448 (0.0030) [2024-03-29 18:51:48,839][00126] Fps is (10 sec: 44237.8, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 1006911488. Throughput: 0: 42048.1. Samples: 889049200. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:51:48,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 18:51:49,266][00497] Updated weights for policy 0, policy_version 61458 (0.0017) [2024-03-29 18:51:49,297][00476] Signal inference workers to stop experience collection... (31650 times) [2024-03-29 18:51:49,340][00497] InferenceWorker_p0-w0: stopping experience collection (31650 times) [2024-03-29 18:51:49,518][00476] Signal inference workers to resume experience collection... (31650 times) [2024-03-29 18:51:49,518][00497] InferenceWorker_p0-w0: resuming experience collection (31650 times) [2024-03-29 18:51:53,573][00497] Updated weights for policy 0, policy_version 61468 (0.0031) [2024-03-29 18:51:53,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 1007091712. Throughput: 0: 41996.0. Samples: 889310420. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:51:53,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 18:51:56,918][00497] Updated weights for policy 0, policy_version 61478 (0.0022) [2024-03-29 18:51:58,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 1007321088. Throughput: 0: 41931.5. Samples: 889546460. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:51:58,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:52:01,299][00497] Updated weights for policy 0, policy_version 61488 (0.0032) [2024-03-29 18:52:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1007534080. Throughput: 0: 42060.9. Samples: 889674740. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:52:03,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 18:52:04,955][00497] Updated weights for policy 0, policy_version 61498 (0.0019) [2024-03-29 18:52:08,841][00126] Fps is (10 sec: 37676.8, 60 sec: 41231.9, 300 sec: 41654.0). Total num frames: 1007697920. Throughput: 0: 41669.6. Samples: 889929140. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:52:08,841][00126] Avg episode reward: [(0, '0.580')] [2024-03-29 18:52:09,470][00497] Updated weights for policy 0, policy_version 61508 (0.0019) [2024-03-29 18:52:12,724][00497] Updated weights for policy 0, policy_version 61518 (0.0030) [2024-03-29 18:52:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 1007943680. Throughput: 0: 41581.4. Samples: 890164860. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:52:13,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 18:52:16,976][00497] Updated weights for policy 0, policy_version 61528 (0.0020) [2024-03-29 18:52:18,839][00126] Fps is (10 sec: 44244.6, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 1008140288. Throughput: 0: 41406.7. Samples: 890286120. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:52:18,840][00126] Avg episode reward: [(0, '0.627')] [2024-03-29 18:52:20,613][00497] Updated weights for policy 0, policy_version 61538 (0.0021) [2024-03-29 18:52:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1008336896. Throughput: 0: 41179.1. Samples: 890536020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 18:52:23,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 18:52:25,407][00497] Updated weights for policy 0, policy_version 61548 (0.0018) [2024-03-29 18:52:26,119][00476] Signal inference workers to stop experience collection... (31700 times) [2024-03-29 18:52:26,160][00497] InferenceWorker_p0-w0: stopping experience collection (31700 times) [2024-03-29 18:52:26,311][00476] Signal inference workers to resume experience collection... (31700 times) [2024-03-29 18:52:26,312][00497] InferenceWorker_p0-w0: resuming experience collection (31700 times) [2024-03-29 18:52:28,497][00497] Updated weights for policy 0, policy_version 61558 (0.0025) [2024-03-29 18:52:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 1008566272. Throughput: 0: 41624.0. Samples: 890791820. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:28,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 18:52:32,739][00497] Updated weights for policy 0, policy_version 61568 (0.0028) [2024-03-29 18:52:33,839][00126] Fps is (10 sec: 44237.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1008779264. Throughput: 0: 41425.3. Samples: 890913340. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:33,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 18:52:36,321][00497] Updated weights for policy 0, policy_version 61578 (0.0022) [2024-03-29 18:52:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.4, 300 sec: 41765.3). Total num frames: 1008975872. Throughput: 0: 41332.2. Samples: 891170360. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:38,840][00126] Avg episode reward: [(0, '0.520')] [2024-03-29 18:52:41,167][00497] Updated weights for policy 0, policy_version 61588 (0.0027) [2024-03-29 18:52:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1009188864. Throughput: 0: 41684.9. Samples: 891422280. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:43,841][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 18:52:44,174][00497] Updated weights for policy 0, policy_version 61598 (0.0021) [2024-03-29 18:52:48,526][00497] Updated weights for policy 0, policy_version 61608 (0.0028) [2024-03-29 18:52:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 1009401856. Throughput: 0: 41455.7. Samples: 891540240. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:48,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 18:52:52,262][00497] Updated weights for policy 0, policy_version 61618 (0.0024) [2024-03-29 18:52:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 1009614848. Throughput: 0: 41450.4. Samples: 891794340. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:53,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 18:52:56,678][00497] Updated weights for policy 0, policy_version 61628 (0.0026) [2024-03-29 18:52:58,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1009811456. Throughput: 0: 42180.5. Samples: 892062980. Policy #0 lag: (min: 0.0, avg: 18.2, max: 42.0) [2024-03-29 18:52:58,840][00126] Avg episode reward: [(0, '0.551')] [2024-03-29 18:52:58,962][00476] Signal inference workers to stop experience collection... (31750 times) [2024-03-29 18:52:58,963][00476] Signal inference workers to resume experience collection... (31750 times) [2024-03-29 18:52:59,004][00497] InferenceWorker_p0-w0: stopping experience collection (31750 times) [2024-03-29 18:52:59,004][00497] InferenceWorker_p0-w0: resuming experience collection (31750 times) [2024-03-29 18:52:59,827][00497] Updated weights for policy 0, policy_version 61638 (0.0020) [2024-03-29 18:53:03,753][00497] Updated weights for policy 0, policy_version 61648 (0.0018) [2024-03-29 18:53:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 1010040832. Throughput: 0: 42001.2. Samples: 892176180. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:03,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 18:53:04,149][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061649_1010057216.pth... [2024-03-29 18:53:04,475][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061037_1000030208.pth [2024-03-29 18:53:07,577][00497] Updated weights for policy 0, policy_version 61658 (0.0026) [2024-03-29 18:53:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42599.7, 300 sec: 41931.9). Total num frames: 1010253824. Throughput: 0: 42100.6. Samples: 892430540. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:08,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 18:53:12,230][00497] Updated weights for policy 0, policy_version 61668 (0.0019) [2024-03-29 18:53:13,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 1010434048. Throughput: 0: 42218.2. Samples: 892691640. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:13,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 18:53:15,552][00497] Updated weights for policy 0, policy_version 61678 (0.0029) [2024-03-29 18:53:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 1010679808. Throughput: 0: 41947.6. Samples: 892800980. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:18,841][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 18:53:19,284][00497] Updated weights for policy 0, policy_version 61688 (0.0037) [2024-03-29 18:53:23,240][00497] Updated weights for policy 0, policy_version 61698 (0.0023) [2024-03-29 18:53:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 1010876416. Throughput: 0: 42118.5. Samples: 893065700. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:23,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 18:53:27,745][00497] Updated weights for policy 0, policy_version 61708 (0.0024) [2024-03-29 18:53:28,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1011073024. Throughput: 0: 42291.6. Samples: 893325400. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:28,840][00126] Avg episode reward: [(0, '0.572')] [2024-03-29 18:53:30,936][00497] Updated weights for policy 0, policy_version 61718 (0.0020) [2024-03-29 18:53:32,648][00476] Signal inference workers to stop experience collection... (31800 times) [2024-03-29 18:53:32,649][00476] Signal inference workers to resume experience collection... (31800 times) [2024-03-29 18:53:32,694][00497] InferenceWorker_p0-w0: stopping experience collection (31800 times) [2024-03-29 18:53:32,694][00497] InferenceWorker_p0-w0: resuming experience collection (31800 times) [2024-03-29 18:53:33,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41820.8). Total num frames: 1011318784. Throughput: 0: 42256.8. Samples: 893441800. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:33,840][00126] Avg episode reward: [(0, '0.668')] [2024-03-29 18:53:34,797][00497] Updated weights for policy 0, policy_version 61728 (0.0024) [2024-03-29 18:53:38,634][00497] Updated weights for policy 0, policy_version 61738 (0.0024) [2024-03-29 18:53:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 1011515392. Throughput: 0: 42412.9. Samples: 893702920. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 18:53:38,840][00126] Avg episode reward: [(0, '0.536')] [2024-03-29 18:53:43,249][00497] Updated weights for policy 0, policy_version 61748 (0.0025) [2024-03-29 18:53:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 1011712000. Throughput: 0: 42194.8. Samples: 893961740. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:53:43,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 18:53:46,630][00497] Updated weights for policy 0, policy_version 61758 (0.0022) [2024-03-29 18:53:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 1011941376. Throughput: 0: 41943.2. Samples: 894063620. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:53:48,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 18:53:50,713][00497] Updated weights for policy 0, policy_version 61768 (0.0019) [2024-03-29 18:53:53,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1012121600. Throughput: 0: 41802.1. Samples: 894311640. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:53:53,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 18:53:54,832][00497] Updated weights for policy 0, policy_version 61778 (0.0023) [2024-03-29 18:53:58,839][00126] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 1012301824. Throughput: 0: 41757.8. Samples: 894570740. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:53:58,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:53:59,117][00497] Updated weights for policy 0, policy_version 61788 (0.0027) [2024-03-29 18:54:02,560][00497] Updated weights for policy 0, policy_version 61798 (0.0029) [2024-03-29 18:54:03,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 1012531200. Throughput: 0: 42107.5. Samples: 894695820. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:54:03,840][00126] Avg episode reward: [(0, '0.669')] [2024-03-29 18:54:06,556][00497] Updated weights for policy 0, policy_version 61808 (0.0019) [2024-03-29 18:54:07,435][00476] Signal inference workers to stop experience collection... (31850 times) [2024-03-29 18:54:07,473][00497] InferenceWorker_p0-w0: stopping experience collection (31850 times) [2024-03-29 18:54:07,523][00476] Signal inference workers to resume experience collection... (31850 times) [2024-03-29 18:54:07,523][00497] InferenceWorker_p0-w0: resuming experience collection (31850 times) [2024-03-29 18:54:08,839][00126] Fps is (10 sec: 44235.9, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 1012744192. Throughput: 0: 41579.1. Samples: 894936760. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:54:08,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 18:54:10,733][00497] Updated weights for policy 0, policy_version 61818 (0.0032) [2024-03-29 18:54:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1012940800. Throughput: 0: 41528.9. Samples: 895194200. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:54:13,841][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 18:54:14,849][00497] Updated weights for policy 0, policy_version 61828 (0.0019) [2024-03-29 18:54:18,547][00497] Updated weights for policy 0, policy_version 61838 (0.0023) [2024-03-29 18:54:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 1013153792. Throughput: 0: 41459.1. Samples: 895307460. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 18:54:18,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 18:54:22,508][00497] Updated weights for policy 0, policy_version 61848 (0.0023) [2024-03-29 18:54:23,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 1013366784. Throughput: 0: 40911.5. Samples: 895543940. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:23,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:54:26,502][00497] Updated weights for policy 0, policy_version 61858 (0.0027) [2024-03-29 18:54:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 1013547008. Throughput: 0: 40852.4. Samples: 895800100. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:28,840][00126] Avg episode reward: [(0, '0.674')] [2024-03-29 18:54:30,758][00497] Updated weights for policy 0, policy_version 61868 (0.0030) [2024-03-29 18:54:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40960.0, 300 sec: 41709.8). Total num frames: 1013776384. Throughput: 0: 41632.4. Samples: 895937080. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:33,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 18:54:34,244][00497] Updated weights for policy 0, policy_version 61878 (0.0023) [2024-03-29 18:54:38,187][00497] Updated weights for policy 0, policy_version 61888 (0.0022) [2024-03-29 18:54:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 1013989376. Throughput: 0: 41357.4. Samples: 896172720. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:38,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 18:54:40,710][00476] Signal inference workers to stop experience collection... (31900 times) [2024-03-29 18:54:40,711][00476] Signal inference workers to resume experience collection... (31900 times) [2024-03-29 18:54:40,750][00497] InferenceWorker_p0-w0: stopping experience collection (31900 times) [2024-03-29 18:54:40,750][00497] InferenceWorker_p0-w0: resuming experience collection (31900 times) [2024-03-29 18:54:42,154][00497] Updated weights for policy 0, policy_version 61898 (0.0024) [2024-03-29 18:54:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 1014202368. Throughput: 0: 41191.9. Samples: 896424380. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:43,840][00126] Avg episode reward: [(0, '0.497')] [2024-03-29 18:54:46,534][00497] Updated weights for policy 0, policy_version 61908 (0.0021) [2024-03-29 18:54:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40686.9, 300 sec: 41654.2). Total num frames: 1014382592. Throughput: 0: 41443.9. Samples: 896560800. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:48,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 18:54:50,028][00497] Updated weights for policy 0, policy_version 61918 (0.0022) [2024-03-29 18:54:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 1014628352. Throughput: 0: 41363.6. Samples: 896798120. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 18:54:53,835][00497] Updated weights for policy 0, policy_version 61928 (0.0031) [2024-03-29 18:54:53,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 18:54:57,902][00497] Updated weights for policy 0, policy_version 61938 (0.0027) [2024-03-29 18:54:58,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 1014824960. Throughput: 0: 41328.0. Samples: 897053960. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:54:58,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 18:55:01,977][00497] Updated weights for policy 0, policy_version 61948 (0.0025) [2024-03-29 18:55:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 1015021568. Throughput: 0: 41807.9. Samples: 897188820. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:03,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 18:55:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061952_1015021568.pth... [2024-03-29 18:55:04,164][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061341_1005010944.pth [2024-03-29 18:55:05,807][00497] Updated weights for policy 0, policy_version 61958 (0.0025) [2024-03-29 18:55:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1015234560. Throughput: 0: 41668.4. Samples: 897419020. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:08,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 18:55:09,753][00497] Updated weights for policy 0, policy_version 61968 (0.0018) [2024-03-29 18:55:11,341][00476] Signal inference workers to stop experience collection... (31950 times) [2024-03-29 18:55:11,402][00497] InferenceWorker_p0-w0: stopping experience collection (31950 times) [2024-03-29 18:55:11,437][00476] Signal inference workers to resume experience collection... (31950 times) [2024-03-29 18:55:11,439][00497] InferenceWorker_p0-w0: resuming experience collection (31950 times) [2024-03-29 18:55:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 1015431168. Throughput: 0: 41509.2. Samples: 897668020. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:13,840][00126] Avg episode reward: [(0, '0.493')] [2024-03-29 18:55:14,025][00497] Updated weights for policy 0, policy_version 61978 (0.0022) [2024-03-29 18:55:17,926][00497] Updated weights for policy 0, policy_version 61988 (0.0027) [2024-03-29 18:55:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 1015644160. Throughput: 0: 41483.5. Samples: 897803840. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:18,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 18:55:21,582][00497] Updated weights for policy 0, policy_version 61998 (0.0028) [2024-03-29 18:55:23,839][00126] Fps is (10 sec: 44237.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1015873536. Throughput: 0: 41714.8. Samples: 898049880. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:23,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:55:25,478][00497] Updated weights for policy 0, policy_version 62008 (0.0019) [2024-03-29 18:55:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1016053760. Throughput: 0: 41671.2. Samples: 898299580. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:28,841][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 18:55:29,431][00497] Updated weights for policy 0, policy_version 62018 (0.0026) [2024-03-29 18:55:33,599][00497] Updated weights for policy 0, policy_version 62028 (0.0019) [2024-03-29 18:55:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1016266752. Throughput: 0: 41543.2. Samples: 898430240. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 18:55:33,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 18:55:37,054][00497] Updated weights for policy 0, policy_version 62038 (0.0025) [2024-03-29 18:55:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 1016512512. Throughput: 0: 41760.0. Samples: 898677320. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:55:38,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 18:55:41,291][00497] Updated weights for policy 0, policy_version 62048 (0.0025) [2024-03-29 18:55:42,021][00476] Signal inference workers to stop experience collection... (32000 times) [2024-03-29 18:55:42,059][00497] InferenceWorker_p0-w0: stopping experience collection (32000 times) [2024-03-29 18:55:42,240][00476] Signal inference workers to resume experience collection... (32000 times) [2024-03-29 18:55:42,240][00497] InferenceWorker_p0-w0: resuming experience collection (32000 times) [2024-03-29 18:55:43,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 1016709120. Throughput: 0: 41407.4. Samples: 898917300. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:55:43,840][00126] Avg episode reward: [(0, '0.680')] [2024-03-29 18:55:45,293][00497] Updated weights for policy 0, policy_version 62058 (0.0025) [2024-03-29 18:55:48,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 1016905728. Throughput: 0: 41473.4. Samples: 899055120. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:55:48,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 18:55:49,565][00497] Updated weights for policy 0, policy_version 62068 (0.0022) [2024-03-29 18:55:53,113][00497] Updated weights for policy 0, policy_version 62078 (0.0030) [2024-03-29 18:55:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 1017102336. Throughput: 0: 41960.5. Samples: 899307240. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:55:53,840][00126] Avg episode reward: [(0, '0.625')] [2024-03-29 18:55:57,230][00497] Updated weights for policy 0, policy_version 62088 (0.0024) [2024-03-29 18:55:58,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 1017331712. Throughput: 0: 41452.0. Samples: 899533360. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:55:58,842][00126] Avg episode reward: [(0, '0.647')] [2024-03-29 18:56:01,421][00497] Updated weights for policy 0, policy_version 62098 (0.0020) [2024-03-29 18:56:03,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.3, 300 sec: 41654.3). Total num frames: 1017511936. Throughput: 0: 41584.5. Samples: 899675140. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:56:03,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 18:56:05,233][00497] Updated weights for policy 0, policy_version 62108 (0.0023) [2024-03-29 18:56:08,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 1017724928. Throughput: 0: 41624.0. Samples: 899922960. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:56:08,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 18:56:09,068][00497] Updated weights for policy 0, policy_version 62118 (0.0036) [2024-03-29 18:56:12,736][00497] Updated weights for policy 0, policy_version 62128 (0.0028) [2024-03-29 18:56:13,839][00126] Fps is (10 sec: 44235.9, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 1017954304. Throughput: 0: 41501.6. Samples: 900167160. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 18:56:13,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 18:56:16,888][00497] Updated weights for policy 0, policy_version 62138 (0.0022) [2024-03-29 18:56:17,349][00476] Signal inference workers to stop experience collection... (32050 times) [2024-03-29 18:56:17,418][00497] InferenceWorker_p0-w0: stopping experience collection (32050 times) [2024-03-29 18:56:17,423][00476] Signal inference workers to resume experience collection... (32050 times) [2024-03-29 18:56:17,445][00497] InferenceWorker_p0-w0: resuming experience collection (32050 times) [2024-03-29 18:56:18,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 1018134528. Throughput: 0: 41537.6. Samples: 900299440. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:18,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 18:56:20,855][00497] Updated weights for policy 0, policy_version 62148 (0.0033) [2024-03-29 18:56:23,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 1018347520. Throughput: 0: 41685.3. Samples: 900553160. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:23,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 18:56:24,723][00497] Updated weights for policy 0, policy_version 62158 (0.0027) [2024-03-29 18:56:28,510][00497] Updated weights for policy 0, policy_version 62168 (0.0032) [2024-03-29 18:56:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 1018576896. Throughput: 0: 41619.2. Samples: 900790160. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:28,840][00126] Avg episode reward: [(0, '0.638')] [2024-03-29 18:56:32,791][00497] Updated weights for policy 0, policy_version 62178 (0.0025) [2024-03-29 18:56:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41654.3). Total num frames: 1018757120. Throughput: 0: 41172.0. Samples: 900907860. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:33,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:56:36,960][00497] Updated weights for policy 0, policy_version 62188 (0.0024) [2024-03-29 18:56:38,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40687.0, 300 sec: 41543.2). Total num frames: 1018953728. Throughput: 0: 41465.9. Samples: 901173200. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:38,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 18:56:40,601][00497] Updated weights for policy 0, policy_version 62198 (0.0026) [2024-03-29 18:56:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 1019183104. Throughput: 0: 41568.5. Samples: 901403940. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:43,840][00126] Avg episode reward: [(0, '0.560')] [2024-03-29 18:56:44,552][00497] Updated weights for policy 0, policy_version 62208 (0.0025) [2024-03-29 18:56:48,838][00497] Updated weights for policy 0, policy_version 62218 (0.0022) [2024-03-29 18:56:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 1019379712. Throughput: 0: 41011.9. Samples: 901520680. Policy #0 lag: (min: 1.0, avg: 22.7, max: 43.0) [2024-03-29 18:56:48,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:56:50,772][00476] Signal inference workers to stop experience collection... (32100 times) [2024-03-29 18:56:50,853][00476] Signal inference workers to resume experience collection... (32100 times) [2024-03-29 18:56:50,855][00497] InferenceWorker_p0-w0: stopping experience collection (32100 times) [2024-03-29 18:56:50,884][00497] InferenceWorker_p0-w0: resuming experience collection (32100 times) [2024-03-29 18:56:52,862][00497] Updated weights for policy 0, policy_version 62228 (0.0033) [2024-03-29 18:56:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1019592704. Throughput: 0: 41289.2. Samples: 901780980. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:56:53,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 18:56:56,633][00497] Updated weights for policy 0, policy_version 62238 (0.0028) [2024-03-29 18:56:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 1019805696. Throughput: 0: 40980.0. Samples: 902011260. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:56:58,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 18:57:00,550][00497] Updated weights for policy 0, policy_version 62248 (0.0024) [2024-03-29 18:57:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.0, 300 sec: 41710.0). Total num frames: 1020002304. Throughput: 0: 41107.5. Samples: 902149280. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:57:03,840][00126] Avg episode reward: [(0, '0.609')] [2024-03-29 18:57:03,865][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000062256_1020002304.pth... [2024-03-29 18:57:04,185][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061649_1010057216.pth [2024-03-29 18:57:04,768][00497] Updated weights for policy 0, policy_version 62258 (0.0025) [2024-03-29 18:57:08,474][00497] Updated weights for policy 0, policy_version 62268 (0.0026) [2024-03-29 18:57:08,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 1020198912. Throughput: 0: 41260.1. Samples: 902409860. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:57:08,840][00126] Avg episode reward: [(0, '0.640')] [2024-03-29 18:57:12,764][00497] Updated weights for policy 0, policy_version 62278 (0.0023) [2024-03-29 18:57:13,839][00126] Fps is (10 sec: 42599.6, 60 sec: 41233.2, 300 sec: 41654.2). Total num frames: 1020428288. Throughput: 0: 41349.0. Samples: 902650860. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:57:13,840][00126] Avg episode reward: [(0, '0.517')] [2024-03-29 18:57:16,494][00497] Updated weights for policy 0, policy_version 62288 (0.0022) [2024-03-29 18:57:18,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1020641280. Throughput: 0: 41233.2. Samples: 902763360. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:57:18,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 18:57:20,660][00497] Updated weights for policy 0, policy_version 62298 (0.0026) [2024-03-29 18:57:21,108][00476] Signal inference workers to stop experience collection... (32150 times) [2024-03-29 18:57:21,134][00497] InferenceWorker_p0-w0: stopping experience collection (32150 times) [2024-03-29 18:57:21,300][00476] Signal inference workers to resume experience collection... (32150 times) [2024-03-29 18:57:21,301][00497] InferenceWorker_p0-w0: resuming experience collection (32150 times) [2024-03-29 18:57:23,839][00126] Fps is (10 sec: 39320.8, 60 sec: 41233.0, 300 sec: 41543.1). Total num frames: 1020821504. Throughput: 0: 41032.8. Samples: 903019680. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:57:23,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 18:57:24,273][00497] Updated weights for policy 0, policy_version 62308 (0.0021) [2024-03-29 18:57:28,446][00497] Updated weights for policy 0, policy_version 62318 (0.0025) [2024-03-29 18:57:28,839][00126] Fps is (10 sec: 39322.1, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 1021034496. Throughput: 0: 41640.1. Samples: 903277740. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 18:57:28,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 18:57:32,254][00497] Updated weights for policy 0, policy_version 62328 (0.0029) [2024-03-29 18:57:33,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 1021263872. Throughput: 0: 41406.3. Samples: 903383960. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:57:33,840][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 18:57:36,150][00497] Updated weights for policy 0, policy_version 62338 (0.0024) [2024-03-29 18:57:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 1021444096. Throughput: 0: 41655.7. Samples: 903655480. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:57:38,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 18:57:39,957][00497] Updated weights for policy 0, policy_version 62348 (0.0027) [2024-03-29 18:57:43,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 1021640704. Throughput: 0: 41934.4. Samples: 903898300. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:57:43,840][00126] Avg episode reward: [(0, '0.635')] [2024-03-29 18:57:44,178][00497] Updated weights for policy 0, policy_version 62358 (0.0027) [2024-03-29 18:57:47,879][00497] Updated weights for policy 0, policy_version 62368 (0.0023) [2024-03-29 18:57:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1021886464. Throughput: 0: 41233.4. Samples: 904004780. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:57:48,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:57:51,972][00497] Updated weights for policy 0, policy_version 62378 (0.0020) [2024-03-29 18:57:53,021][00476] Signal inference workers to stop experience collection... (32200 times) [2024-03-29 18:57:53,058][00497] InferenceWorker_p0-w0: stopping experience collection (32200 times) [2024-03-29 18:57:53,249][00476] Signal inference workers to resume experience collection... (32200 times) [2024-03-29 18:57:53,250][00497] InferenceWorker_p0-w0: resuming experience collection (32200 times) [2024-03-29 18:57:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 1022066688. Throughput: 0: 41366.7. Samples: 904271360. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:57:53,840][00126] Avg episode reward: [(0, '0.532')] [2024-03-29 18:57:55,718][00497] Updated weights for policy 0, policy_version 62388 (0.0022) [2024-03-29 18:57:58,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1022279680. Throughput: 0: 41720.8. Samples: 904528300. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:57:58,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 18:57:59,878][00497] Updated weights for policy 0, policy_version 62398 (0.0045) [2024-03-29 18:58:03,428][00497] Updated weights for policy 0, policy_version 62408 (0.0022) [2024-03-29 18:58:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.3, 300 sec: 41487.6). Total num frames: 1022492672. Throughput: 0: 41750.4. Samples: 904642120. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:58:03,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 18:58:07,706][00497] Updated weights for policy 0, policy_version 62418 (0.0022) [2024-03-29 18:58:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 1022689280. Throughput: 0: 41722.4. Samples: 904897180. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 18:58:08,840][00126] Avg episode reward: [(0, '0.559')] [2024-03-29 18:58:11,423][00497] Updated weights for policy 0, policy_version 62428 (0.0023) [2024-03-29 18:58:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41233.0, 300 sec: 41432.1). Total num frames: 1022902272. Throughput: 0: 41228.4. Samples: 905133020. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:13,840][00126] Avg episode reward: [(0, '0.537')] [2024-03-29 18:58:15,743][00497] Updated weights for policy 0, policy_version 62438 (0.0020) [2024-03-29 18:58:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1023115264. Throughput: 0: 41994.6. Samples: 905273720. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:18,841][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:58:19,234][00497] Updated weights for policy 0, policy_version 62448 (0.0027) [2024-03-29 18:58:23,735][00497] Updated weights for policy 0, policy_version 62458 (0.0021) [2024-03-29 18:58:23,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 1023311872. Throughput: 0: 40967.8. Samples: 905499040. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:23,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:58:25,230][00476] Signal inference workers to stop experience collection... (32250 times) [2024-03-29 18:58:25,311][00476] Signal inference workers to resume experience collection... (32250 times) [2024-03-29 18:58:25,318][00497] InferenceWorker_p0-w0: stopping experience collection (32250 times) [2024-03-29 18:58:25,340][00497] InferenceWorker_p0-w0: resuming experience collection (32250 times) [2024-03-29 18:58:27,169][00497] Updated weights for policy 0, policy_version 62468 (0.0024) [2024-03-29 18:58:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41432.1). Total num frames: 1023541248. Throughput: 0: 41408.3. Samples: 905761680. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:28,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 18:58:31,422][00497] Updated weights for policy 0, policy_version 62478 (0.0021) [2024-03-29 18:58:33,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 1023754240. Throughput: 0: 42096.5. Samples: 905899120. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:33,840][00126] Avg episode reward: [(0, '0.650')] [2024-03-29 18:58:34,880][00497] Updated weights for policy 0, policy_version 62488 (0.0019) [2024-03-29 18:58:38,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 1023934464. Throughput: 0: 41152.5. Samples: 906123220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:38,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 18:58:39,254][00497] Updated weights for policy 0, policy_version 62498 (0.0023) [2024-03-29 18:58:43,052][00497] Updated weights for policy 0, policy_version 62508 (0.0021) [2024-03-29 18:58:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.1, 300 sec: 41376.5). Total num frames: 1024147456. Throughput: 0: 41047.6. Samples: 906375440. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:43,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 18:58:47,356][00497] Updated weights for policy 0, policy_version 62518 (0.0021) [2024-03-29 18:58:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.2, 300 sec: 41487.6). Total num frames: 1024360448. Throughput: 0: 41480.8. Samples: 906508760. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 18:58:48,840][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 18:58:51,019][00497] Updated weights for policy 0, policy_version 62528 (0.0022) [2024-03-29 18:58:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1024573440. Throughput: 0: 41181.3. Samples: 906750340. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:58:53,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 18:58:55,180][00497] Updated weights for policy 0, policy_version 62538 (0.0020) [2024-03-29 18:58:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41487.6). Total num frames: 1024770048. Throughput: 0: 41609.4. Samples: 907005440. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:58:58,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 18:58:58,910][00497] Updated weights for policy 0, policy_version 62548 (0.0030) [2024-03-29 18:59:02,425][00476] Signal inference workers to stop experience collection... (32300 times) [2024-03-29 18:59:02,426][00476] Signal inference workers to resume experience collection... (32300 times) [2024-03-29 18:59:02,459][00497] InferenceWorker_p0-w0: stopping experience collection (32300 times) [2024-03-29 18:59:02,459][00497] InferenceWorker_p0-w0: resuming experience collection (32300 times) [2024-03-29 18:59:02,719][00497] Updated weights for policy 0, policy_version 62558 (0.0024) [2024-03-29 18:59:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 41543.2). Total num frames: 1024999424. Throughput: 0: 41534.2. Samples: 907142760. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:59:03,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:59:04,070][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000062562_1025015808.pth... [2024-03-29 18:59:04,383][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000061952_1015021568.pth [2024-03-29 18:59:06,609][00497] Updated weights for policy 0, policy_version 62568 (0.0025) [2024-03-29 18:59:08,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 1025228800. Throughput: 0: 41881.8. Samples: 907383720. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:59:08,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 18:59:11,038][00497] Updated weights for policy 0, policy_version 62578 (0.0023) [2024-03-29 18:59:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 1025392640. Throughput: 0: 41546.3. Samples: 907631260. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:59:13,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 18:59:14,593][00497] Updated weights for policy 0, policy_version 62588 (0.0027) [2024-03-29 18:59:18,668][00497] Updated weights for policy 0, policy_version 62598 (0.0027) [2024-03-29 18:59:18,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41506.2, 300 sec: 41487.6). Total num frames: 1025605632. Throughput: 0: 41341.9. Samples: 907759500. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:59:18,840][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 18:59:22,250][00497] Updated weights for policy 0, policy_version 62608 (0.0021) [2024-03-29 18:59:23,839][00126] Fps is (10 sec: 44235.9, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 1025835008. Throughput: 0: 41895.3. Samples: 908008520. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 18:59:23,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 18:59:26,571][00497] Updated weights for policy 0, policy_version 62618 (0.0022) [2024-03-29 18:59:28,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 1026031616. Throughput: 0: 41950.1. Samples: 908263200. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:28,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 18:59:30,300][00497] Updated weights for policy 0, policy_version 62628 (0.0020) [2024-03-29 18:59:33,839][00126] Fps is (10 sec: 39322.6, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1026228224. Throughput: 0: 41614.2. Samples: 908381400. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:33,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 18:59:34,283][00497] Updated weights for policy 0, policy_version 62638 (0.0017) [2024-03-29 18:59:37,904][00497] Updated weights for policy 0, policy_version 62648 (0.0021) [2024-03-29 18:59:38,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.2, 300 sec: 41543.2). Total num frames: 1026457600. Throughput: 0: 41885.3. Samples: 908635180. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:38,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 18:59:38,883][00476] Signal inference workers to stop experience collection... (32350 times) [2024-03-29 18:59:38,914][00497] InferenceWorker_p0-w0: stopping experience collection (32350 times) [2024-03-29 18:59:39,098][00476] Signal inference workers to resume experience collection... (32350 times) [2024-03-29 18:59:39,098][00497] InferenceWorker_p0-w0: resuming experience collection (32350 times) [2024-03-29 18:59:42,047][00497] Updated weights for policy 0, policy_version 62658 (0.0025) [2024-03-29 18:59:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 1026637824. Throughput: 0: 42247.1. Samples: 908906560. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:43,840][00126] Avg episode reward: [(0, '0.642')] [2024-03-29 18:59:46,014][00497] Updated weights for policy 0, policy_version 62668 (0.0021) [2024-03-29 18:59:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.0, 300 sec: 41432.1). Total num frames: 1026850816. Throughput: 0: 41403.5. Samples: 909005920. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:48,840][00126] Avg episode reward: [(0, '0.514')] [2024-03-29 18:59:50,033][00497] Updated weights for policy 0, policy_version 62678 (0.0019) [2024-03-29 18:59:53,560][00497] Updated weights for policy 0, policy_version 62688 (0.0024) [2024-03-29 18:59:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 1027080192. Throughput: 0: 41590.8. Samples: 909255300. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:53,840][00126] Avg episode reward: [(0, '0.653')] [2024-03-29 18:59:57,715][00497] Updated weights for policy 0, policy_version 62698 (0.0025) [2024-03-29 18:59:58,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 1027260416. Throughput: 0: 42084.5. Samples: 909525060. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 18:59:58,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 19:00:01,653][00497] Updated weights for policy 0, policy_version 62708 (0.0024) [2024-03-29 19:00:03,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 1027489792. Throughput: 0: 41910.5. Samples: 909645480. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 19:00:03,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 19:00:05,772][00497] Updated weights for policy 0, policy_version 62718 (0.0021) [2024-03-29 19:00:08,839][00126] Fps is (10 sec: 45874.9, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 1027719168. Throughput: 0: 41785.1. Samples: 909888840. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:08,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 19:00:09,152][00497] Updated weights for policy 0, policy_version 62728 (0.0024) [2024-03-29 19:00:10,041][00476] Signal inference workers to stop experience collection... (32400 times) [2024-03-29 19:00:10,041][00476] Signal inference workers to resume experience collection... (32400 times) [2024-03-29 19:00:10,076][00497] InferenceWorker_p0-w0: stopping experience collection (32400 times) [2024-03-29 19:00:10,076][00497] InferenceWorker_p0-w0: resuming experience collection (32400 times) [2024-03-29 19:00:13,446][00497] Updated weights for policy 0, policy_version 62738 (0.0023) [2024-03-29 19:00:13,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 1027899392. Throughput: 0: 41974.7. Samples: 910152060. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:13,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 19:00:17,311][00497] Updated weights for policy 0, policy_version 62748 (0.0029) [2024-03-29 19:00:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.1, 300 sec: 41543.1). Total num frames: 1028128768. Throughput: 0: 41951.0. Samples: 910269200. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:18,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 19:00:21,554][00497] Updated weights for policy 0, policy_version 62758 (0.0017) [2024-03-29 19:00:23,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.4, 300 sec: 41654.2). Total num frames: 1028341760. Throughput: 0: 41938.7. Samples: 910522420. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:23,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 19:00:24,916][00497] Updated weights for policy 0, policy_version 62768 (0.0025) [2024-03-29 19:00:28,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 1028538368. Throughput: 0: 41128.9. Samples: 910757360. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:28,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 19:00:29,245][00497] Updated weights for policy 0, policy_version 62778 (0.0022) [2024-03-29 19:00:33,085][00497] Updated weights for policy 0, policy_version 62788 (0.0024) [2024-03-29 19:00:33,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 1028734976. Throughput: 0: 42039.2. Samples: 910897680. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:33,840][00126] Avg episode reward: [(0, '0.656')] [2024-03-29 19:00:37,350][00497] Updated weights for policy 0, policy_version 62798 (0.0026) [2024-03-29 19:00:38,797][00476] Signal inference workers to stop experience collection... (32450 times) [2024-03-29 19:00:38,836][00497] InferenceWorker_p0-w0: stopping experience collection (32450 times) [2024-03-29 19:00:38,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41779.1, 300 sec: 41543.2). Total num frames: 1028964352. Throughput: 0: 41979.4. Samples: 911144380. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:38,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 19:00:39,018][00476] Signal inference workers to resume experience collection... (32450 times) [2024-03-29 19:00:39,018][00497] InferenceWorker_p0-w0: resuming experience collection (32450 times) [2024-03-29 19:00:40,685][00497] Updated weights for policy 0, policy_version 62808 (0.0032) [2024-03-29 19:00:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41543.2). Total num frames: 1029160960. Throughput: 0: 41217.7. Samples: 911379860. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 19:00:43,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 19:00:45,028][00497] Updated weights for policy 0, policy_version 62818 (0.0025) [2024-03-29 19:00:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 1029357568. Throughput: 0: 41839.1. Samples: 911528240. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:00:48,840][00126] Avg episode reward: [(0, '0.666')] [2024-03-29 19:00:49,064][00497] Updated weights for policy 0, policy_version 62828 (0.0022) [2024-03-29 19:00:53,248][00497] Updated weights for policy 0, policy_version 62838 (0.0026) [2024-03-29 19:00:53,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.0, 300 sec: 41487.6). Total num frames: 1029570560. Throughput: 0: 41639.0. Samples: 911762600. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:00:53,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 19:00:56,660][00497] Updated weights for policy 0, policy_version 62848 (0.0029) [2024-03-29 19:00:58,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 1029783552. Throughput: 0: 41228.5. Samples: 912007340. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:00:58,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 19:01:01,006][00497] Updated weights for policy 0, policy_version 62858 (0.0023) [2024-03-29 19:01:03,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1029963776. Throughput: 0: 41698.3. Samples: 912145620. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:01:03,841][00126] Avg episode reward: [(0, '0.676')] [2024-03-29 19:01:03,921][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000062865_1029980160.pth... [2024-03-29 19:01:04,263][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000062256_1020002304.pth [2024-03-29 19:01:05,097][00497] Updated weights for policy 0, policy_version 62868 (0.0032) [2024-03-29 19:01:08,839][00126] Fps is (10 sec: 39320.8, 60 sec: 40959.9, 300 sec: 41432.1). Total num frames: 1030176768. Throughput: 0: 41170.1. Samples: 912375080. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:01:08,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 19:01:09,319][00497] Updated weights for policy 0, policy_version 62878 (0.0030) [2024-03-29 19:01:09,919][00476] Signal inference workers to stop experience collection... (32500 times) [2024-03-29 19:01:09,959][00497] InferenceWorker_p0-w0: stopping experience collection (32500 times) [2024-03-29 19:01:10,144][00476] Signal inference workers to resume experience collection... (32500 times) [2024-03-29 19:01:10,144][00497] InferenceWorker_p0-w0: resuming experience collection (32500 times) [2024-03-29 19:01:12,768][00497] Updated weights for policy 0, policy_version 62888 (0.0026) [2024-03-29 19:01:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 1030389760. Throughput: 0: 41087.0. Samples: 912606280. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:01:13,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 19:01:17,378][00497] Updated weights for policy 0, policy_version 62898 (0.0019) [2024-03-29 19:01:18,839][00126] Fps is (10 sec: 40961.0, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 1030586368. Throughput: 0: 40939.6. Samples: 912739960. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:01:18,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 19:01:21,120][00497] Updated weights for policy 0, policy_version 62908 (0.0021) [2024-03-29 19:01:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40959.9, 300 sec: 41432.1). Total num frames: 1030799360. Throughput: 0: 40939.1. Samples: 912986640. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 19:01:23,840][00126] Avg episode reward: [(0, '0.650')] [2024-03-29 19:01:25,263][00497] Updated weights for policy 0, policy_version 62918 (0.0022) [2024-03-29 19:01:28,794][00497] Updated weights for policy 0, policy_version 62928 (0.0021) [2024-03-29 19:01:28,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41233.0, 300 sec: 41543.1). Total num frames: 1031012352. Throughput: 0: 41009.2. Samples: 913225280. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:28,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 19:01:33,216][00497] Updated weights for policy 0, policy_version 62938 (0.0025) [2024-03-29 19:01:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 1031192576. Throughput: 0: 40539.6. Samples: 913352520. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:33,840][00126] Avg episode reward: [(0, '0.538')] [2024-03-29 19:01:37,013][00497] Updated weights for policy 0, policy_version 62948 (0.0021) [2024-03-29 19:01:38,839][00126] Fps is (10 sec: 40959.7, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 1031421952. Throughput: 0: 41052.0. Samples: 913609940. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:38,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 19:01:41,093][00497] Updated weights for policy 0, policy_version 62958 (0.0022) [2024-03-29 19:01:42,273][00476] Signal inference workers to stop experience collection... (32550 times) [2024-03-29 19:01:42,342][00497] InferenceWorker_p0-w0: stopping experience collection (32550 times) [2024-03-29 19:01:42,374][00476] Signal inference workers to resume experience collection... (32550 times) [2024-03-29 19:01:42,376][00497] InferenceWorker_p0-w0: resuming experience collection (32550 times) [2024-03-29 19:01:43,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 1031634944. Throughput: 0: 40818.6. Samples: 913844180. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:43,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 19:01:44,477][00497] Updated weights for policy 0, policy_version 62968 (0.0027) [2024-03-29 19:01:48,839][00126] Fps is (10 sec: 39322.3, 60 sec: 40960.1, 300 sec: 41432.1). Total num frames: 1031815168. Throughput: 0: 40696.0. Samples: 913976940. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:48,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 19:01:49,220][00497] Updated weights for policy 0, policy_version 62978 (0.0019) [2024-03-29 19:01:52,820][00497] Updated weights for policy 0, policy_version 62988 (0.0029) [2024-03-29 19:01:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1032044544. Throughput: 0: 41383.2. Samples: 914237320. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:53,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 19:01:56,655][00497] Updated weights for policy 0, policy_version 62998 (0.0018) [2024-03-29 19:01:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41233.0, 300 sec: 41543.2). Total num frames: 1032257536. Throughput: 0: 41620.1. Samples: 914479180. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:01:58,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 19:02:00,190][00497] Updated weights for policy 0, policy_version 63008 (0.0030) [2024-03-29 19:02:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 1032437760. Throughput: 0: 41396.2. Samples: 914602800. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 19:02:03,840][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 19:02:05,056][00497] Updated weights for policy 0, policy_version 63018 (0.0018) [2024-03-29 19:02:08,612][00497] Updated weights for policy 0, policy_version 63028 (0.0024) [2024-03-29 19:02:08,840][00126] Fps is (10 sec: 39316.8, 60 sec: 41232.3, 300 sec: 41431.9). Total num frames: 1032650752. Throughput: 0: 41486.5. Samples: 914853580. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:08,841][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 19:02:12,467][00497] Updated weights for policy 0, policy_version 63038 (0.0022) [2024-03-29 19:02:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 1032863744. Throughput: 0: 41759.6. Samples: 915104460. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:13,840][00126] Avg episode reward: [(0, '0.494')] [2024-03-29 19:02:14,366][00476] Signal inference workers to stop experience collection... (32600 times) [2024-03-29 19:02:14,430][00497] InferenceWorker_p0-w0: stopping experience collection (32600 times) [2024-03-29 19:02:14,439][00476] Signal inference workers to resume experience collection... (32600 times) [2024-03-29 19:02:14,459][00497] InferenceWorker_p0-w0: resuming experience collection (32600 times) [2024-03-29 19:02:16,262][00497] Updated weights for policy 0, policy_version 63048 (0.0024) [2024-03-29 19:02:18,839][00126] Fps is (10 sec: 44242.0, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 1033093120. Throughput: 0: 41480.5. Samples: 915219140. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:18,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 19:02:20,878][00497] Updated weights for policy 0, policy_version 63058 (0.0031) [2024-03-29 19:02:23,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 1033273344. Throughput: 0: 41673.8. Samples: 915485260. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:23,840][00126] Avg episode reward: [(0, '0.659')] [2024-03-29 19:02:24,254][00497] Updated weights for policy 0, policy_version 63068 (0.0034) [2024-03-29 19:02:27,949][00497] Updated weights for policy 0, policy_version 63078 (0.0019) [2024-03-29 19:02:28,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 1033502720. Throughput: 0: 42035.9. Samples: 915735800. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:28,840][00126] Avg episode reward: [(0, '0.600')] [2024-03-29 19:02:31,438][00497] Updated weights for policy 0, policy_version 63088 (0.0026) [2024-03-29 19:02:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 1033715712. Throughput: 0: 41587.0. Samples: 915848360. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:33,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 19:02:36,176][00497] Updated weights for policy 0, policy_version 63098 (0.0019) [2024-03-29 19:02:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 1033912320. Throughput: 0: 41773.8. Samples: 916117140. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 19:02:38,840][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 19:02:39,789][00497] Updated weights for policy 0, policy_version 63108 (0.0024) [2024-03-29 19:02:43,559][00497] Updated weights for policy 0, policy_version 63118 (0.0033) [2024-03-29 19:02:43,839][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.2, 300 sec: 41487.6). Total num frames: 1034125312. Throughput: 0: 41799.6. Samples: 916360160. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:02:43,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 19:02:46,920][00497] Updated weights for policy 0, policy_version 63128 (0.0024) [2024-03-29 19:02:47,785][00476] Signal inference workers to stop experience collection... (32650 times) [2024-03-29 19:02:47,786][00476] Signal inference workers to resume experience collection... (32650 times) [2024-03-29 19:02:47,825][00497] InferenceWorker_p0-w0: stopping experience collection (32650 times) [2024-03-29 19:02:47,829][00497] InferenceWorker_p0-w0: resuming experience collection (32650 times) [2024-03-29 19:02:48,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 1034354688. Throughput: 0: 41849.0. Samples: 916486000. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:02:48,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 19:02:51,790][00497] Updated weights for policy 0, policy_version 63138 (0.0021) [2024-03-29 19:02:53,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1034551296. Throughput: 0: 42177.9. Samples: 916751540. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:02:53,841][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 19:02:55,336][00497] Updated weights for policy 0, policy_version 63148 (0.0022) [2024-03-29 19:02:58,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 1034764288. Throughput: 0: 41932.4. Samples: 916991420. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:02:58,840][00126] Avg episode reward: [(0, '0.528')] [2024-03-29 19:02:58,984][00497] Updated weights for policy 0, policy_version 63158 (0.0025) [2024-03-29 19:03:02,772][00497] Updated weights for policy 0, policy_version 63168 (0.0024) [2024-03-29 19:03:03,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 1034977280. Throughput: 0: 41976.9. Samples: 917108100. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:03:03,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 19:03:04,026][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000063171_1034993664.pth... [2024-03-29 19:03:04,367][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000062562_1025015808.pth [2024-03-29 19:03:07,583][00497] Updated weights for policy 0, policy_version 63178 (0.0033) [2024-03-29 19:03:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42053.0, 300 sec: 41598.7). Total num frames: 1035173888. Throughput: 0: 42253.8. Samples: 917386680. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:03:08,840][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 19:03:11,175][00497] Updated weights for policy 0, policy_version 63188 (0.0026) [2024-03-29 19:03:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 1035386880. Throughput: 0: 41888.1. Samples: 917620760. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:03:13,840][00126] Avg episode reward: [(0, '0.513')] [2024-03-29 19:03:14,581][00497] Updated weights for policy 0, policy_version 63198 (0.0021) [2024-03-29 19:03:18,420][00497] Updated weights for policy 0, policy_version 63208 (0.0018) [2024-03-29 19:03:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 1035599872. Throughput: 0: 42184.5. Samples: 917746660. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 19:03:18,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 19:03:23,224][00497] Updated weights for policy 0, policy_version 63218 (0.0019) [2024-03-29 19:03:23,253][00476] Signal inference workers to stop experience collection... (32700 times) [2024-03-29 19:03:23,273][00497] InferenceWorker_p0-w0: stopping experience collection (32700 times) [2024-03-29 19:03:23,474][00476] Signal inference workers to resume experience collection... (32700 times) [2024-03-29 19:03:23,475][00497] InferenceWorker_p0-w0: resuming experience collection (32700 times) [2024-03-29 19:03:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 1035796480. Throughput: 0: 41963.9. Samples: 918005520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:23,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 19:03:26,702][00497] Updated weights for policy 0, policy_version 63228 (0.0030) [2024-03-29 19:03:28,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 1036009472. Throughput: 0: 41864.5. Samples: 918244060. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:28,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 19:03:30,170][00497] Updated weights for policy 0, policy_version 63238 (0.0024) [2024-03-29 19:03:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 1036238848. Throughput: 0: 41994.5. Samples: 918375760. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:33,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 19:03:34,208][00497] Updated weights for policy 0, policy_version 63248 (0.0026) [2024-03-29 19:03:38,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 1036402688. Throughput: 0: 41679.6. Samples: 918627120. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:38,841][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 19:03:38,923][00497] Updated weights for policy 0, policy_version 63258 (0.0026) [2024-03-29 19:03:42,408][00497] Updated weights for policy 0, policy_version 63268 (0.0028) [2024-03-29 19:03:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.1, 300 sec: 41654.2). Total num frames: 1036648448. Throughput: 0: 41798.6. Samples: 918872360. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:43,840][00126] Avg episode reward: [(0, '0.604')] [2024-03-29 19:03:45,979][00497] Updated weights for policy 0, policy_version 63278 (0.0029) [2024-03-29 19:03:48,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 1036861440. Throughput: 0: 41920.3. Samples: 918994520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:48,842][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 19:03:50,146][00497] Updated weights for policy 0, policy_version 63288 (0.0028) [2024-03-29 19:03:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1037041664. Throughput: 0: 41316.4. Samples: 919245920. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:53,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 19:03:54,550][00497] Updated weights for policy 0, policy_version 63298 (0.0017) [2024-03-29 19:03:58,025][00497] Updated weights for policy 0, policy_version 63308 (0.0023) [2024-03-29 19:03:58,214][00476] Signal inference workers to stop experience collection... (32750 times) [2024-03-29 19:03:58,249][00497] InferenceWorker_p0-w0: stopping experience collection (32750 times) [2024-03-29 19:03:58,395][00476] Signal inference workers to resume experience collection... (32750 times) [2024-03-29 19:03:58,396][00497] InferenceWorker_p0-w0: resuming experience collection (32750 times) [2024-03-29 19:03:58,839][00126] Fps is (10 sec: 40961.0, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 1037271040. Throughput: 0: 41882.8. Samples: 919505480. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:03:58,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 19:04:01,583][00497] Updated weights for policy 0, policy_version 63318 (0.0026) [2024-03-29 19:04:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 1037484032. Throughput: 0: 41561.3. Samples: 919616920. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:03,840][00126] Avg episode reward: [(0, '0.637')] [2024-03-29 19:04:05,852][00497] Updated weights for policy 0, policy_version 63328 (0.0036) [2024-03-29 19:04:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 1037680640. Throughput: 0: 41416.9. Samples: 919869280. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:08,840][00126] Avg episode reward: [(0, '0.616')] [2024-03-29 19:04:10,310][00497] Updated weights for policy 0, policy_version 63338 (0.0021) [2024-03-29 19:04:13,670][00497] Updated weights for policy 0, policy_version 63348 (0.0022) [2024-03-29 19:04:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 1037893632. Throughput: 0: 42002.5. Samples: 920134180. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:13,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 19:04:17,141][00497] Updated weights for policy 0, policy_version 63358 (0.0027) [2024-03-29 19:04:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 1038090240. Throughput: 0: 41770.3. Samples: 920255420. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:18,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 19:04:21,441][00497] Updated weights for policy 0, policy_version 63368 (0.0030) [2024-03-29 19:04:23,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42052.4, 300 sec: 41654.3). Total num frames: 1038319616. Throughput: 0: 41558.8. Samples: 920497260. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:23,841][00126] Avg episode reward: [(0, '0.666')] [2024-03-29 19:04:26,196][00497] Updated weights for policy 0, policy_version 63378 (0.0021) [2024-03-29 19:04:28,609][00476] Signal inference workers to stop experience collection... (32800 times) [2024-03-29 19:04:28,622][00497] InferenceWorker_p0-w0: stopping experience collection (32800 times) [2024-03-29 19:04:28,817][00476] Signal inference workers to resume experience collection... (32800 times) [2024-03-29 19:04:28,818][00497] InferenceWorker_p0-w0: resuming experience collection (32800 times) [2024-03-29 19:04:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 1038516224. Throughput: 0: 42023.2. Samples: 920763400. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:28,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 19:04:29,591][00497] Updated weights for policy 0, policy_version 63388 (0.0023) [2024-03-29 19:04:32,904][00497] Updated weights for policy 0, policy_version 63398 (0.0029) [2024-03-29 19:04:33,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 1038729216. Throughput: 0: 41710.3. Samples: 920871480. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:33,840][00126] Avg episode reward: [(0, '0.603')] [2024-03-29 19:04:37,309][00497] Updated weights for policy 0, policy_version 63408 (0.0020) [2024-03-29 19:04:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 1038942208. Throughput: 0: 41659.7. Samples: 921120600. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 19:04:38,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 19:04:41,813][00497] Updated weights for policy 0, policy_version 63418 (0.0029) [2024-03-29 19:04:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 1039122432. Throughput: 0: 41716.2. Samples: 921382720. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:04:43,840][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 19:04:45,285][00497] Updated weights for policy 0, policy_version 63428 (0.0018) [2024-03-29 19:04:48,380][00497] Updated weights for policy 0, policy_version 63438 (0.0020) [2024-03-29 19:04:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 1039368192. Throughput: 0: 41981.9. Samples: 921506100. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:04:48,840][00126] Avg episode reward: [(0, '0.675')] [2024-03-29 19:04:52,968][00497] Updated weights for policy 0, policy_version 63448 (0.0024) [2024-03-29 19:04:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 1039564800. Throughput: 0: 41834.6. Samples: 921751840. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:04:53,840][00126] Avg episode reward: [(0, '0.670')] [2024-03-29 19:04:56,963][00497] Updated weights for policy 0, policy_version 63458 (0.0027) [2024-03-29 19:04:58,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1039761408. Throughput: 0: 41986.4. Samples: 922023560. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:04:58,840][00126] Avg episode reward: [(0, '0.619')] [2024-03-29 19:05:00,720][00497] Updated weights for policy 0, policy_version 63468 (0.0030) [2024-03-29 19:05:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 1040007168. Throughput: 0: 42058.7. Samples: 922148060. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:05:03,840][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 19:05:03,869][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000063478_1040023552.pth... [2024-03-29 19:05:03,881][00497] Updated weights for policy 0, policy_version 63478 (0.0025) [2024-03-29 19:05:04,178][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000062865_1029980160.pth [2024-03-29 19:05:04,684][00476] Signal inference workers to stop experience collection... (32850 times) [2024-03-29 19:05:04,745][00497] InferenceWorker_p0-w0: stopping experience collection (32850 times) [2024-03-29 19:05:04,763][00476] Signal inference workers to resume experience collection... (32850 times) [2024-03-29 19:05:04,777][00497] InferenceWorker_p0-w0: resuming experience collection (32850 times) [2024-03-29 19:05:08,207][00497] Updated weights for policy 0, policy_version 63488 (0.0019) [2024-03-29 19:05:08,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 1040203776. Throughput: 0: 42181.3. Samples: 922395420. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:05:08,840][00126] Avg episode reward: [(0, '0.680')] [2024-03-29 19:05:12,444][00497] Updated weights for policy 0, policy_version 63498 (0.0023) [2024-03-29 19:05:13,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1040400384. Throughput: 0: 42222.2. Samples: 922663400. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:05:13,840][00126] Avg episode reward: [(0, '0.641')] [2024-03-29 19:05:16,109][00497] Updated weights for policy 0, policy_version 63508 (0.0023) [2024-03-29 19:05:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 1040629760. Throughput: 0: 42548.1. Samples: 922786140. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:05:18,840][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 19:05:19,518][00497] Updated weights for policy 0, policy_version 63518 (0.0023) [2024-03-29 19:05:23,775][00497] Updated weights for policy 0, policy_version 63528 (0.0037) [2024-03-29 19:05:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.1, 300 sec: 41709.7). Total num frames: 1040842752. Throughput: 0: 42205.2. Samples: 923019840. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:23,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 19:05:28,227][00497] Updated weights for policy 0, policy_version 63538 (0.0020) [2024-03-29 19:05:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 1041022976. Throughput: 0: 42374.8. Samples: 923289580. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:28,840][00126] Avg episode reward: [(0, '0.666')] [2024-03-29 19:05:31,736][00497] Updated weights for policy 0, policy_version 63548 (0.0023) [2024-03-29 19:05:33,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 1041268736. Throughput: 0: 42385.8. Samples: 923413460. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:33,840][00126] Avg episode reward: [(0, '0.506')] [2024-03-29 19:05:35,010][00497] Updated weights for policy 0, policy_version 63558 (0.0019) [2024-03-29 19:05:38,098][00476] Signal inference workers to stop experience collection... (32900 times) [2024-03-29 19:05:38,099][00476] Signal inference workers to resume experience collection... (32900 times) [2024-03-29 19:05:38,140][00497] InferenceWorker_p0-w0: stopping experience collection (32900 times) [2024-03-29 19:05:38,140][00497] InferenceWorker_p0-w0: resuming experience collection (32900 times) [2024-03-29 19:05:38,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 1041481728. Throughput: 0: 42338.7. Samples: 923657080. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:38,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 19:05:39,190][00497] Updated weights for policy 0, policy_version 63568 (0.0024) [2024-03-29 19:05:43,592][00497] Updated weights for policy 0, policy_version 63578 (0.0027) [2024-03-29 19:05:43,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 1041661952. Throughput: 0: 42154.2. Samples: 923920500. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:43,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 19:05:47,443][00497] Updated weights for policy 0, policy_version 63588 (0.0029) [2024-03-29 19:05:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1041874944. Throughput: 0: 42132.4. Samples: 924044020. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:48,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 19:05:50,911][00497] Updated weights for policy 0, policy_version 63598 (0.0027) [2024-03-29 19:05:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 1042104320. Throughput: 0: 41974.6. Samples: 924284280. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:53,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 19:05:55,093][00497] Updated weights for policy 0, policy_version 63608 (0.0021) [2024-03-29 19:05:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.2, 300 sec: 41820.8). Total num frames: 1042300928. Throughput: 0: 41733.8. Samples: 924541420. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:05:58,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 19:05:59,344][00497] Updated weights for policy 0, policy_version 63618 (0.0022) [2024-03-29 19:06:03,093][00497] Updated weights for policy 0, policy_version 63628 (0.0021) [2024-03-29 19:06:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 1042513920. Throughput: 0: 42083.4. Samples: 924679900. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:03,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 19:06:06,314][00497] Updated weights for policy 0, policy_version 63638 (0.0019) [2024-03-29 19:06:08,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 1042743296. Throughput: 0: 42231.4. Samples: 924920240. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:08,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 19:06:10,224][00497] Updated weights for policy 0, policy_version 63648 (0.0027) [2024-03-29 19:06:12,276][00476] Signal inference workers to stop experience collection... (32950 times) [2024-03-29 19:06:12,307][00497] InferenceWorker_p0-w0: stopping experience collection (32950 times) [2024-03-29 19:06:12,461][00476] Signal inference workers to resume experience collection... (32950 times) [2024-03-29 19:06:12,461][00497] InferenceWorker_p0-w0: resuming experience collection (32950 times) [2024-03-29 19:06:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 1042939904. Throughput: 0: 41910.2. Samples: 925175540. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:13,840][00126] Avg episode reward: [(0, '0.656')] [2024-03-29 19:06:14,859][00497] Updated weights for policy 0, policy_version 63658 (0.0019) [2024-03-29 19:06:18,570][00497] Updated weights for policy 0, policy_version 63668 (0.0025) [2024-03-29 19:06:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 1043152896. Throughput: 0: 42261.4. Samples: 925315220. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:18,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 19:06:21,880][00497] Updated weights for policy 0, policy_version 63678 (0.0024) [2024-03-29 19:06:23,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.5, 300 sec: 41932.0). Total num frames: 1043382272. Throughput: 0: 42148.1. Samples: 925553740. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:23,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 19:06:25,514][00497] Updated weights for policy 0, policy_version 63688 (0.0022) [2024-03-29 19:06:28,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 1043595264. Throughput: 0: 41957.7. Samples: 925808600. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:28,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 19:06:30,536][00497] Updated weights for policy 0, policy_version 63698 (0.0030) [2024-03-29 19:06:33,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 1043775488. Throughput: 0: 42173.7. Samples: 925941840. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:33,842][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 19:06:34,039][00497] Updated weights for policy 0, policy_version 63708 (0.0023) [2024-03-29 19:06:37,324][00497] Updated weights for policy 0, policy_version 63718 (0.0023) [2024-03-29 19:06:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 1044004864. Throughput: 0: 42397.3. Samples: 926192160. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 19:06:38,840][00126] Avg episode reward: [(0, '0.575')] [2024-03-29 19:06:40,984][00497] Updated weights for policy 0, policy_version 63728 (0.0018) [2024-03-29 19:06:43,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42871.5, 300 sec: 42098.6). Total num frames: 1044234240. Throughput: 0: 42204.6. Samples: 926440620. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:06:43,840][00126] Avg episode reward: [(0, '0.571')] [2024-03-29 19:06:45,889][00476] Signal inference workers to stop experience collection... (33000 times) [2024-03-29 19:06:45,952][00497] InferenceWorker_p0-w0: stopping experience collection (33000 times) [2024-03-29 19:06:45,960][00476] Signal inference workers to resume experience collection... (33000 times) [2024-03-29 19:06:45,978][00497] InferenceWorker_p0-w0: resuming experience collection (33000 times) [2024-03-29 19:06:45,985][00497] Updated weights for policy 0, policy_version 63738 (0.0023) [2024-03-29 19:06:48,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 1044414464. Throughput: 0: 42307.7. Samples: 926583740. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:06:48,840][00126] Avg episode reward: [(0, '0.601')] [2024-03-29 19:06:49,646][00497] Updated weights for policy 0, policy_version 63748 (0.0020) [2024-03-29 19:06:52,904][00497] Updated weights for policy 0, policy_version 63758 (0.0026) [2024-03-29 19:06:53,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 1044627456. Throughput: 0: 42099.1. Samples: 926814700. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:06:53,840][00126] Avg episode reward: [(0, '0.529')] [2024-03-29 19:06:56,769][00497] Updated weights for policy 0, policy_version 63768 (0.0021) [2024-03-29 19:06:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 1044856832. Throughput: 0: 42273.9. Samples: 927077860. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:06:58,840][00126] Avg episode reward: [(0, '0.615')] [2024-03-29 19:07:01,449][00497] Updated weights for policy 0, policy_version 63778 (0.0026) [2024-03-29 19:07:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41987.6). Total num frames: 1045037056. Throughput: 0: 42001.7. Samples: 927205300. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:07:03,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 19:07:04,378][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000063786_1045069824.pth... [2024-03-29 19:07:04,684][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000063171_1034993664.pth [2024-03-29 19:07:05,352][00497] Updated weights for policy 0, policy_version 63788 (0.0023) [2024-03-29 19:07:08,397][00497] Updated weights for policy 0, policy_version 63798 (0.0025) [2024-03-29 19:07:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 1045266432. Throughput: 0: 42260.0. Samples: 927455440. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:07:08,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 19:07:12,447][00497] Updated weights for policy 0, policy_version 63808 (0.0018) [2024-03-29 19:07:13,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 1045479424. Throughput: 0: 42036.5. Samples: 927700240. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:07:13,840][00126] Avg episode reward: [(0, '0.584')] [2024-03-29 19:07:16,936][00497] Updated weights for policy 0, policy_version 63818 (0.0022) [2024-03-29 19:07:17,571][00476] Signal inference workers to stop experience collection... (33050 times) [2024-03-29 19:07:17,644][00497] InferenceWorker_p0-w0: stopping experience collection (33050 times) [2024-03-29 19:07:17,644][00476] Signal inference workers to resume experience collection... (33050 times) [2024-03-29 19:07:17,670][00497] InferenceWorker_p0-w0: resuming experience collection (33050 times) [2024-03-29 19:07:18,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 1045659648. Throughput: 0: 41857.0. Samples: 927825400. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 19:07:18,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 19:07:20,904][00497] Updated weights for policy 0, policy_version 63828 (0.0021) [2024-03-29 19:07:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 1045889024. Throughput: 0: 41867.1. Samples: 928076180. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:23,840][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 19:07:24,324][00497] Updated weights for policy 0, policy_version 63838 (0.0023) [2024-03-29 19:07:28,007][00497] Updated weights for policy 0, policy_version 63848 (0.0024) [2024-03-29 19:07:28,840][00126] Fps is (10 sec: 44233.5, 60 sec: 41778.7, 300 sec: 41987.4). Total num frames: 1046102016. Throughput: 0: 41869.5. Samples: 928324780. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:28,840][00126] Avg episode reward: [(0, '0.597')] [2024-03-29 19:07:32,818][00497] Updated weights for policy 0, policy_version 63858 (0.0024) [2024-03-29 19:07:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 1046298624. Throughput: 0: 41310.6. Samples: 928442720. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:33,840][00126] Avg episode reward: [(0, '0.583')] [2024-03-29 19:07:36,750][00497] Updated weights for policy 0, policy_version 63868 (0.0027) [2024-03-29 19:07:38,839][00126] Fps is (10 sec: 40962.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 1046511616. Throughput: 0: 42189.3. Samples: 928713220. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:38,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 19:07:39,961][00497] Updated weights for policy 0, policy_version 63878 (0.0029) [2024-03-29 19:07:43,794][00497] Updated weights for policy 0, policy_version 63888 (0.0029) [2024-03-29 19:07:43,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.0, 300 sec: 41987.4). Total num frames: 1046740992. Throughput: 0: 41612.3. Samples: 928950420. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:43,840][00126] Avg episode reward: [(0, '0.651')] [2024-03-29 19:07:48,531][00497] Updated weights for policy 0, policy_version 63898 (0.0029) [2024-03-29 19:07:48,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 1046921216. Throughput: 0: 41411.9. Samples: 929068840. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:48,840][00126] Avg episode reward: [(0, '0.692')] [2024-03-29 19:07:49,111][00476] Signal inference workers to stop experience collection... (33100 times) [2024-03-29 19:07:49,152][00497] InferenceWorker_p0-w0: stopping experience collection (33100 times) [2024-03-29 19:07:49,332][00476] Signal inference workers to resume experience collection... (33100 times) [2024-03-29 19:07:49,333][00497] InferenceWorker_p0-w0: resuming experience collection (33100 times) [2024-03-29 19:07:52,625][00497] Updated weights for policy 0, policy_version 63908 (0.0020) [2024-03-29 19:07:53,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 1047134208. Throughput: 0: 41783.1. Samples: 929335680. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:53,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 19:07:55,630][00497] Updated weights for policy 0, policy_version 63918 (0.0028) [2024-03-29 19:07:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 1047363584. Throughput: 0: 41468.3. Samples: 929566320. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 19:07:58,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 19:07:59,845][00497] Updated weights for policy 0, policy_version 63928 (0.0021) [2024-03-29 19:08:03,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 1047543808. Throughput: 0: 41747.0. Samples: 929704020. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:03,841][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 19:08:04,182][00497] Updated weights for policy 0, policy_version 63938 (0.0029) [2024-03-29 19:08:08,471][00497] Updated weights for policy 0, policy_version 63948 (0.0028) [2024-03-29 19:08:08,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 1047740416. Throughput: 0: 41778.3. Samples: 929956200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:08,840][00126] Avg episode reward: [(0, '0.613')] [2024-03-29 19:08:11,789][00497] Updated weights for policy 0, policy_version 63958 (0.0022) [2024-03-29 19:08:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 1047969792. Throughput: 0: 41657.5. Samples: 930199340. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:13,840][00126] Avg episode reward: [(0, '0.594')] [2024-03-29 19:08:15,744][00497] Updated weights for policy 0, policy_version 63968 (0.0033) [2024-03-29 19:08:18,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 1048182784. Throughput: 0: 41675.0. Samples: 930318100. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:18,841][00126] Avg episode reward: [(0, '0.525')] [2024-03-29 19:08:20,097][00497] Updated weights for policy 0, policy_version 63978 (0.0025) [2024-03-29 19:08:21,248][00476] Signal inference workers to stop experience collection... (33150 times) [2024-03-29 19:08:21,248][00476] Signal inference workers to resume experience collection... (33150 times) [2024-03-29 19:08:21,287][00497] InferenceWorker_p0-w0: stopping experience collection (33150 times) [2024-03-29 19:08:21,287][00497] InferenceWorker_p0-w0: resuming experience collection (33150 times) [2024-03-29 19:08:23,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 1048363008. Throughput: 0: 41291.6. Samples: 930571340. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:23,840][00126] Avg episode reward: [(0, '0.578')] [2024-03-29 19:08:24,302][00497] Updated weights for policy 0, policy_version 63988 (0.0023) [2024-03-29 19:08:27,521][00497] Updated weights for policy 0, policy_version 63998 (0.0026) [2024-03-29 19:08:28,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41506.7, 300 sec: 41876.4). Total num frames: 1048592384. Throughput: 0: 41487.3. Samples: 930817340. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:28,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 19:08:31,352][00497] Updated weights for policy 0, policy_version 64008 (0.0025) [2024-03-29 19:08:33,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 1048805376. Throughput: 0: 41482.4. Samples: 930935540. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:33,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 19:08:36,016][00497] Updated weights for policy 0, policy_version 64018 (0.0034) [2024-03-29 19:08:38,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41820.9). Total num frames: 1048985600. Throughput: 0: 41227.9. Samples: 931190940. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 19:08:38,840][00126] Avg episode reward: [(0, '0.504')] [2024-03-29 19:08:40,236][00497] Updated weights for policy 0, policy_version 64028 (0.0019) [2024-03-29 19:08:43,620][00497] Updated weights for policy 0, policy_version 64038 (0.0028) [2024-03-29 19:08:43,839][00126] Fps is (10 sec: 39321.3, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 1049198592. Throughput: 0: 41283.2. Samples: 931424060. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:08:43,840][00126] Avg episode reward: [(0, '0.574')] [2024-03-29 19:08:47,335][00497] Updated weights for policy 0, policy_version 64048 (0.0025) [2024-03-29 19:08:48,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.3, 300 sec: 41932.0). Total num frames: 1049411584. Throughput: 0: 40966.0. Samples: 931547480. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:08:48,840][00126] Avg episode reward: [(0, '0.610')] [2024-03-29 19:08:52,039][00497] Updated weights for policy 0, policy_version 64058 (0.0030) [2024-03-29 19:08:52,063][00476] Signal inference workers to stop experience collection... (33200 times) [2024-03-29 19:08:52,104][00497] InferenceWorker_p0-w0: stopping experience collection (33200 times) [2024-03-29 19:08:52,283][00476] Signal inference workers to resume experience collection... (33200 times) [2024-03-29 19:08:52,284][00497] InferenceWorker_p0-w0: resuming experience collection (33200 times) [2024-03-29 19:08:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40959.9, 300 sec: 41765.3). Total num frames: 1049591808. Throughput: 0: 41178.6. Samples: 931809240. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:08:53,840][00126] Avg episode reward: [(0, '0.638')] [2024-03-29 19:08:56,080][00497] Updated weights for policy 0, policy_version 64068 (0.0018) [2024-03-29 19:08:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 1049837568. Throughput: 0: 41061.4. Samples: 932047100. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:08:58,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 19:08:59,332][00497] Updated weights for policy 0, policy_version 64078 (0.0021) [2024-03-29 19:09:03,127][00497] Updated weights for policy 0, policy_version 64088 (0.0025) [2024-03-29 19:09:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 1050017792. Throughput: 0: 41179.1. Samples: 932171160. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:09:03,840][00126] Avg episode reward: [(0, '0.640')] [2024-03-29 19:09:04,142][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000064090_1050050560.pth... [2024-03-29 19:09:04,468][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000063478_1040023552.pth [2024-03-29 19:09:07,713][00497] Updated weights for policy 0, policy_version 64098 (0.0026) [2024-03-29 19:09:08,841][00126] Fps is (10 sec: 39314.4, 60 sec: 41504.9, 300 sec: 41820.6). Total num frames: 1050230784. Throughput: 0: 41005.0. Samples: 932416640. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:09:08,841][00126] Avg episode reward: [(0, '0.567')] [2024-03-29 19:09:11,717][00497] Updated weights for policy 0, policy_version 64108 (0.0019) [2024-03-29 19:09:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 1050427392. Throughput: 0: 41172.9. Samples: 932670120. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 19:09:13,840][00126] Avg episode reward: [(0, '0.592')] [2024-03-29 19:09:15,371][00497] Updated weights for policy 0, policy_version 64118 (0.0032) [2024-03-29 19:09:18,839][00126] Fps is (10 sec: 42606.4, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 1050656768. Throughput: 0: 41159.6. Samples: 932787720. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:18,841][00126] Avg episode reward: [(0, '0.623')] [2024-03-29 19:09:19,041][00497] Updated weights for policy 0, policy_version 64128 (0.0023) [2024-03-29 19:09:23,524][00497] Updated weights for policy 0, policy_version 64138 (0.0021) [2024-03-29 19:09:23,839][00126] Fps is (10 sec: 40959.2, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 1050836992. Throughput: 0: 40763.1. Samples: 933025280. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:23,840][00126] Avg episode reward: [(0, '0.549')] [2024-03-29 19:09:26,031][00476] Signal inference workers to stop experience collection... (33250 times) [2024-03-29 19:09:26,072][00497] InferenceWorker_p0-w0: stopping experience collection (33250 times) [2024-03-29 19:09:26,254][00476] Signal inference workers to resume experience collection... (33250 times) [2024-03-29 19:09:26,255][00497] InferenceWorker_p0-w0: resuming experience collection (33250 times) [2024-03-29 19:09:27,648][00497] Updated weights for policy 0, policy_version 64148 (0.0021) [2024-03-29 19:09:28,839][00126] Fps is (10 sec: 39321.1, 60 sec: 40959.9, 300 sec: 41765.3). Total num frames: 1051049984. Throughput: 0: 41691.5. Samples: 933300180. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:28,840][00126] Avg episode reward: [(0, '0.622')] [2024-03-29 19:09:30,990][00497] Updated weights for policy 0, policy_version 64158 (0.0023) [2024-03-29 19:09:33,841][00126] Fps is (10 sec: 42591.0, 60 sec: 40958.7, 300 sec: 41765.1). Total num frames: 1051262976. Throughput: 0: 41232.0. Samples: 933403000. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:33,842][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 19:09:34,961][00497] Updated weights for policy 0, policy_version 64168 (0.0020) [2024-03-29 19:09:38,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 1051475968. Throughput: 0: 40965.0. Samples: 933652660. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:38,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 19:09:39,338][00497] Updated weights for policy 0, policy_version 64178 (0.0028) [2024-03-29 19:09:43,416][00497] Updated weights for policy 0, policy_version 64188 (0.0021) [2024-03-29 19:09:43,839][00126] Fps is (10 sec: 39328.8, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 1051656192. Throughput: 0: 41590.2. Samples: 933918660. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:43,840][00126] Avg episode reward: [(0, '0.593')] [2024-03-29 19:09:46,813][00497] Updated weights for policy 0, policy_version 64198 (0.0024) [2024-03-29 19:09:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.0, 300 sec: 41820.9). Total num frames: 1051901952. Throughput: 0: 41169.8. Samples: 934023800. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:48,840][00126] Avg episode reward: [(0, '0.644')] [2024-03-29 19:09:50,802][00497] Updated weights for policy 0, policy_version 64208 (0.0023) [2024-03-29 19:09:53,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 1052114944. Throughput: 0: 41552.6. Samples: 934286440. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 19:09:53,842][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 19:09:55,178][00497] Updated weights for policy 0, policy_version 64218 (0.0028) [2024-03-29 19:09:58,839][00126] Fps is (10 sec: 39322.3, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 1052295168. Throughput: 0: 41698.2. Samples: 934546540. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:09:58,840][00126] Avg episode reward: [(0, '0.561')] [2024-03-29 19:09:58,989][00497] Updated weights for policy 0, policy_version 64228 (0.0017) [2024-03-29 19:09:59,811][00476] Signal inference workers to stop experience collection... (33300 times) [2024-03-29 19:09:59,828][00497] InferenceWorker_p0-w0: stopping experience collection (33300 times) [2024-03-29 19:10:00,024][00476] Signal inference workers to resume experience collection... (33300 times) [2024-03-29 19:10:00,025][00497] InferenceWorker_p0-w0: resuming experience collection (33300 times) [2024-03-29 19:10:02,211][00497] Updated weights for policy 0, policy_version 64238 (0.0024) [2024-03-29 19:10:03,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 1052524544. Throughput: 0: 41759.9. Samples: 934666920. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:03,840][00126] Avg episode reward: [(0, '0.562')] [2024-03-29 19:10:06,435][00497] Updated weights for policy 0, policy_version 64248 (0.0019) [2024-03-29 19:10:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41780.5, 300 sec: 41820.9). Total num frames: 1052737536. Throughput: 0: 42069.5. Samples: 934918400. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:08,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 19:10:10,679][00497] Updated weights for policy 0, policy_version 64258 (0.0032) [2024-03-29 19:10:13,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 1052917760. Throughput: 0: 41693.5. Samples: 935176380. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:13,840][00126] Avg episode reward: [(0, '0.591')] [2024-03-29 19:10:14,644][00497] Updated weights for policy 0, policy_version 64268 (0.0021) [2024-03-29 19:10:17,841][00497] Updated weights for policy 0, policy_version 64278 (0.0020) [2024-03-29 19:10:18,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 1053147136. Throughput: 0: 42096.0. Samples: 935297240. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:18,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 19:10:22,160][00497] Updated weights for policy 0, policy_version 64288 (0.0023) [2024-03-29 19:10:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 1053343744. Throughput: 0: 41779.1. Samples: 935532720. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:23,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 19:10:26,449][00497] Updated weights for policy 0, policy_version 64298 (0.0018) [2024-03-29 19:10:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 1053540352. Throughput: 0: 41573.8. Samples: 935789480. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:28,841][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 19:10:30,454][00497] Updated weights for policy 0, policy_version 64308 (0.0019) [2024-03-29 19:10:32,265][00476] Signal inference workers to stop experience collection... (33350 times) [2024-03-29 19:10:32,305][00497] InferenceWorker_p0-w0: stopping experience collection (33350 times) [2024-03-29 19:10:32,489][00476] Signal inference workers to resume experience collection... (33350 times) [2024-03-29 19:10:32,490][00497] InferenceWorker_p0-w0: resuming experience collection (33350 times) [2024-03-29 19:10:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41780.5, 300 sec: 41654.2). Total num frames: 1053769728. Throughput: 0: 42232.1. Samples: 935924240. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 19:10:33,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 19:10:33,857][00497] Updated weights for policy 0, policy_version 64318 (0.0023) [2024-03-29 19:10:38,048][00497] Updated weights for policy 0, policy_version 64328 (0.0019) [2024-03-29 19:10:38,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 1053966336. Throughput: 0: 41614.0. Samples: 936159060. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:10:38,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 19:10:42,472][00497] Updated weights for policy 0, policy_version 64338 (0.0022) [2024-03-29 19:10:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 1054162944. Throughput: 0: 41340.5. Samples: 936406860. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:10:43,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 19:10:46,367][00497] Updated weights for policy 0, policy_version 64348 (0.0020) [2024-03-29 19:10:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 1054375936. Throughput: 0: 41565.8. Samples: 936537380. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:10:48,840][00126] Avg episode reward: [(0, '0.666')] [2024-03-29 19:10:49,831][00497] Updated weights for policy 0, policy_version 64358 (0.0021) [2024-03-29 19:10:53,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 1054588928. Throughput: 0: 40940.7. Samples: 936760740. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:10:53,841][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 19:10:54,146][00497] Updated weights for policy 0, policy_version 64368 (0.0025) [2024-03-29 19:10:58,181][00497] Updated weights for policy 0, policy_version 64378 (0.0027) [2024-03-29 19:10:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 1054801920. Throughput: 0: 41211.4. Samples: 937030900. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:10:58,840][00126] Avg episode reward: [(0, '0.554')] [2024-03-29 19:11:01,950][00497] Updated weights for policy 0, policy_version 64388 (0.0026) [2024-03-29 19:11:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1055014912. Throughput: 0: 41457.7. Samples: 937162840. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:11:03,840][00126] Avg episode reward: [(0, '0.569')] [2024-03-29 19:11:03,994][00476] Signal inference workers to stop experience collection... (33400 times) [2024-03-29 19:11:04,063][00497] InferenceWorker_p0-w0: stopping experience collection (33400 times) [2024-03-29 19:11:04,069][00476] Signal inference workers to resume experience collection... (33400 times) [2024-03-29 19:11:04,071][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000064394_1055031296.pth... [2024-03-29 19:11:04,087][00497] InferenceWorker_p0-w0: resuming experience collection (33400 times) [2024-03-29 19:11:04,378][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000063786_1045069824.pth [2024-03-29 19:11:05,719][00497] Updated weights for policy 0, policy_version 64398 (0.0022) [2024-03-29 19:11:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 1055211520. Throughput: 0: 41501.3. Samples: 937400280. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:11:08,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 19:11:09,850][00497] Updated weights for policy 0, policy_version 64408 (0.0028) [2024-03-29 19:11:13,829][00497] Updated weights for policy 0, policy_version 64418 (0.0019) [2024-03-29 19:11:13,839][00126] Fps is (10 sec: 40961.0, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1055424512. Throughput: 0: 41162.8. Samples: 937641800. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 19:11:13,840][00126] Avg episode reward: [(0, '0.510')] [2024-03-29 19:11:17,925][00497] Updated weights for policy 0, policy_version 64428 (0.0018) [2024-03-29 19:11:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1055621120. Throughput: 0: 41292.4. Samples: 937782400. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:18,840][00126] Avg episode reward: [(0, '0.606')] [2024-03-29 19:11:21,196][00497] Updated weights for policy 0, policy_version 64438 (0.0019) [2024-03-29 19:11:23,839][00126] Fps is (10 sec: 42597.3, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 1055850496. Throughput: 0: 41786.5. Samples: 938039460. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:23,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 19:11:25,377][00497] Updated weights for policy 0, policy_version 64448 (0.0025) [2024-03-29 19:11:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 1056063488. Throughput: 0: 41432.7. Samples: 938271340. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:28,841][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 19:11:29,610][00497] Updated weights for policy 0, policy_version 64458 (0.0022) [2024-03-29 19:11:33,542][00497] Updated weights for policy 0, policy_version 64468 (0.0023) [2024-03-29 19:11:33,839][00126] Fps is (10 sec: 39322.4, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 1056243712. Throughput: 0: 41515.6. Samples: 938405580. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:33,840][00126] Avg episode reward: [(0, '0.596')] [2024-03-29 19:11:34,944][00476] Signal inference workers to stop experience collection... (33450 times) [2024-03-29 19:11:34,999][00497] InferenceWorker_p0-w0: stopping experience collection (33450 times) [2024-03-29 19:11:35,035][00476] Signal inference workers to resume experience collection... (33450 times) [2024-03-29 19:11:35,042][00497] InferenceWorker_p0-w0: resuming experience collection (33450 times) [2024-03-29 19:11:37,072][00497] Updated weights for policy 0, policy_version 64478 (0.0026) [2024-03-29 19:11:38,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 1056473088. Throughput: 0: 42015.3. Samples: 938651420. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:38,840][00126] Avg episode reward: [(0, '0.552')] [2024-03-29 19:11:41,272][00497] Updated weights for policy 0, policy_version 64488 (0.0029) [2024-03-29 19:11:43,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42052.1, 300 sec: 41598.7). Total num frames: 1056686080. Throughput: 0: 41645.3. Samples: 938904940. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:43,841][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 19:11:45,158][00497] Updated weights for policy 0, policy_version 64498 (0.0022) [2024-03-29 19:11:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 1056882688. Throughput: 0: 41563.7. Samples: 939033200. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:48,840][00126] Avg episode reward: [(0, '0.508')] [2024-03-29 19:11:49,484][00497] Updated weights for policy 0, policy_version 64508 (0.0023) [2024-03-29 19:11:52,960][00497] Updated weights for policy 0, policy_version 64518 (0.0024) [2024-03-29 19:11:53,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41432.1). Total num frames: 1057079296. Throughput: 0: 41508.4. Samples: 939268160. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 19:11:53,840][00126] Avg episode reward: [(0, '0.667')] [2024-03-29 19:11:56,941][00497] Updated weights for policy 0, policy_version 64528 (0.0024) [2024-03-29 19:11:58,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 1057292288. Throughput: 0: 41907.5. Samples: 939527640. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:11:58,840][00126] Avg episode reward: [(0, '0.546')] [2024-03-29 19:12:00,675][00497] Updated weights for policy 0, policy_version 64538 (0.0018) [2024-03-29 19:12:03,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41506.3, 300 sec: 41487.6). Total num frames: 1057505280. Throughput: 0: 41789.4. Samples: 939662920. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:03,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 19:12:05,058][00497] Updated weights for policy 0, policy_version 64548 (0.0027) [2024-03-29 19:12:08,202][00497] Updated weights for policy 0, policy_version 64558 (0.0023) [2024-03-29 19:12:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 1057718272. Throughput: 0: 41547.3. Samples: 939909080. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:08,840][00126] Avg episode reward: [(0, '0.666')] [2024-03-29 19:12:08,843][00476] Signal inference workers to stop experience collection... (33500 times) [2024-03-29 19:12:08,938][00497] InferenceWorker_p0-w0: stopping experience collection (33500 times) [2024-03-29 19:12:09,005][00476] Signal inference workers to resume experience collection... (33500 times) [2024-03-29 19:12:09,006][00497] InferenceWorker_p0-w0: resuming experience collection (33500 times) [2024-03-29 19:12:12,582][00497] Updated weights for policy 0, policy_version 64568 (0.0026) [2024-03-29 19:12:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1057931264. Throughput: 0: 41984.1. Samples: 940160620. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:13,840][00126] Avg episode reward: [(0, '0.655')] [2024-03-29 19:12:16,147][00497] Updated weights for policy 0, policy_version 64578 (0.0018) [2024-03-29 19:12:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 1058127872. Throughput: 0: 41849.3. Samples: 940288800. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:18,840][00126] Avg episode reward: [(0, '0.540')] [2024-03-29 19:12:20,447][00497] Updated weights for policy 0, policy_version 64588 (0.0022) [2024-03-29 19:12:23,514][00497] Updated weights for policy 0, policy_version 64598 (0.0019) [2024-03-29 19:12:23,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42052.3, 300 sec: 41598.8). Total num frames: 1058373632. Throughput: 0: 42234.0. Samples: 940551960. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:23,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 19:12:28,052][00497] Updated weights for policy 0, policy_version 64608 (0.0020) [2024-03-29 19:12:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 1058570240. Throughput: 0: 41960.1. Samples: 940793140. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:28,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 19:12:31,814][00497] Updated weights for policy 0, policy_version 64618 (0.0032) [2024-03-29 19:12:33,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.1, 300 sec: 41487.6). Total num frames: 1058750464. Throughput: 0: 41692.0. Samples: 940909340. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 19:12:33,840][00126] Avg episode reward: [(0, '0.656')] [2024-03-29 19:12:36,121][00497] Updated weights for policy 0, policy_version 64628 (0.0026) [2024-03-29 19:12:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41598.7). Total num frames: 1059012608. Throughput: 0: 42544.9. Samples: 941182680. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:12:38,840][00126] Avg episode reward: [(0, '0.614')] [2024-03-29 19:12:39,129][00497] Updated weights for policy 0, policy_version 64638 (0.0024) [2024-03-29 19:12:41,038][00476] Signal inference workers to stop experience collection... (33550 times) [2024-03-29 19:12:41,105][00497] InferenceWorker_p0-w0: stopping experience collection (33550 times) [2024-03-29 19:12:41,110][00476] Signal inference workers to resume experience collection... (33550 times) [2024-03-29 19:12:41,130][00497] InferenceWorker_p0-w0: resuming experience collection (33550 times) [2024-03-29 19:12:43,555][00497] Updated weights for policy 0, policy_version 64648 (0.0027) [2024-03-29 19:12:43,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 1059209216. Throughput: 0: 42156.8. Samples: 941424700. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:12:43,840][00126] Avg episode reward: [(0, '0.657')] [2024-03-29 19:12:47,391][00497] Updated weights for policy 0, policy_version 64658 (0.0024) [2024-03-29 19:12:48,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 1059389440. Throughput: 0: 41798.6. Samples: 941543860. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:12:48,840][00126] Avg episode reward: [(0, '0.579')] [2024-03-29 19:12:52,021][00497] Updated weights for policy 0, policy_version 64668 (0.0022) [2024-03-29 19:12:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 41543.2). Total num frames: 1059618816. Throughput: 0: 42269.7. Samples: 941811220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:12:53,840][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 19:12:54,862][00497] Updated weights for policy 0, policy_version 64678 (0.0024) [2024-03-29 19:12:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41654.3). Total num frames: 1059831808. Throughput: 0: 41853.8. Samples: 942044040. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:12:58,840][00126] Avg episode reward: [(0, '0.612')] [2024-03-29 19:12:59,379][00497] Updated weights for policy 0, policy_version 64688 (0.0024) [2024-03-29 19:13:03,112][00497] Updated weights for policy 0, policy_version 64698 (0.0023) [2024-03-29 19:13:03,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1060012032. Throughput: 0: 41928.1. Samples: 942175560. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:13:03,840][00126] Avg episode reward: [(0, '0.620')] [2024-03-29 19:13:03,959][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000064699_1060028416.pth... [2024-03-29 19:13:04,299][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000064090_1050050560.pth [2024-03-29 19:13:07,396][00497] Updated weights for policy 0, policy_version 64708 (0.0035) [2024-03-29 19:13:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 1060241408. Throughput: 0: 42040.6. Samples: 942443780. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:13:08,840][00126] Avg episode reward: [(0, '0.590')] [2024-03-29 19:13:10,304][00497] Updated weights for policy 0, policy_version 64718 (0.0026) [2024-03-29 19:13:13,036][00476] Signal inference workers to stop experience collection... (33600 times) [2024-03-29 19:13:13,067][00497] InferenceWorker_p0-w0: stopping experience collection (33600 times) [2024-03-29 19:13:13,231][00476] Signal inference workers to resume experience collection... (33600 times) [2024-03-29 19:13:13,232][00497] InferenceWorker_p0-w0: resuming experience collection (33600 times) [2024-03-29 19:13:13,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 1060454400. Throughput: 0: 41841.2. Samples: 942676000. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 19:13:13,840][00126] Avg episode reward: [(0, '0.641')] [2024-03-29 19:13:15,269][00497] Updated weights for policy 0, policy_version 64728 (0.0019) [2024-03-29 19:13:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 1060651008. Throughput: 0: 41980.1. Samples: 942798440. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:18,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 19:13:19,003][00497] Updated weights for policy 0, policy_version 64738 (0.0023) [2024-03-29 19:13:23,181][00497] Updated weights for policy 0, policy_version 64748 (0.0031) [2024-03-29 19:13:23,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 1060864000. Throughput: 0: 41982.2. Samples: 943071880. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:23,840][00126] Avg episode reward: [(0, '0.621')] [2024-03-29 19:13:26,125][00497] Updated weights for policy 0, policy_version 64758 (0.0020) [2024-03-29 19:13:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 1061076992. Throughput: 0: 41807.2. Samples: 943306020. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:28,840][00126] Avg episode reward: [(0, '0.635')] [2024-03-29 19:13:30,494][00497] Updated weights for policy 0, policy_version 64768 (0.0022) [2024-03-29 19:13:33,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 1061289984. Throughput: 0: 42101.4. Samples: 943438420. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:33,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 19:13:34,364][00497] Updated weights for policy 0, policy_version 64778 (0.0017) [2024-03-29 19:13:38,644][00497] Updated weights for policy 0, policy_version 64788 (0.0022) [2024-03-29 19:13:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 1061486592. Throughput: 0: 42024.1. Samples: 943702300. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:38,840][00126] Avg episode reward: [(0, '0.547')] [2024-03-29 19:13:41,940][00497] Updated weights for policy 0, policy_version 64798 (0.0023) [2024-03-29 19:13:43,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1061715968. Throughput: 0: 42104.8. Samples: 943938760. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:43,842][00126] Avg episode reward: [(0, '0.582')] [2024-03-29 19:13:46,246][00497] Updated weights for policy 0, policy_version 64808 (0.0020) [2024-03-29 19:13:47,237][00476] Signal inference workers to stop experience collection... (33650 times) [2024-03-29 19:13:47,269][00497] InferenceWorker_p0-w0: stopping experience collection (33650 times) [2024-03-29 19:13:47,458][00476] Signal inference workers to resume experience collection... (33650 times) [2024-03-29 19:13:47,459][00497] InferenceWorker_p0-w0: resuming experience collection (33650 times) [2024-03-29 19:13:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1061912576. Throughput: 0: 41899.5. Samples: 944061040. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:48,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 19:13:50,224][00497] Updated weights for policy 0, policy_version 64818 (0.0025) [2024-03-29 19:13:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 1062109184. Throughput: 0: 41789.6. Samples: 944324320. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 19:13:53,840][00126] Avg episode reward: [(0, '0.587')] [2024-03-29 19:13:54,489][00497] Updated weights for policy 0, policy_version 64828 (0.0025) [2024-03-29 19:13:57,554][00497] Updated weights for policy 0, policy_version 64838 (0.0028) [2024-03-29 19:13:58,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 1062322176. Throughput: 0: 41695.7. Samples: 944552300. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:13:58,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 19:14:01,940][00497] Updated weights for policy 0, policy_version 64848 (0.0019) [2024-03-29 19:14:03,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 41710.0). Total num frames: 1062535168. Throughput: 0: 41893.8. Samples: 944683660. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:03,840][00126] Avg episode reward: [(0, '0.607')] [2024-03-29 19:14:06,023][00497] Updated weights for policy 0, policy_version 64858 (0.0025) [2024-03-29 19:14:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 1062715392. Throughput: 0: 41575.1. Samples: 944942760. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:08,840][00126] Avg episode reward: [(0, '0.646')] [2024-03-29 19:14:10,270][00497] Updated weights for policy 0, policy_version 64868 (0.0024) [2024-03-29 19:14:13,587][00497] Updated weights for policy 0, policy_version 64878 (0.0028) [2024-03-29 19:14:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 1062961152. Throughput: 0: 41173.8. Samples: 945158840. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:13,840][00126] Avg episode reward: [(0, '0.668')] [2024-03-29 19:14:17,997][00497] Updated weights for policy 0, policy_version 64888 (0.0022) [2024-03-29 19:14:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 1063157760. Throughput: 0: 41326.9. Samples: 945298140. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:18,840][00126] Avg episode reward: [(0, '0.598')] [2024-03-29 19:14:22,123][00497] Updated weights for policy 0, policy_version 64898 (0.0018) [2024-03-29 19:14:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 1063337984. Throughput: 0: 41191.0. Samples: 945555900. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:23,840][00126] Avg episode reward: [(0, '0.631')] [2024-03-29 19:14:25,348][00476] Signal inference workers to stop experience collection... (33700 times) [2024-03-29 19:14:25,373][00497] InferenceWorker_p0-w0: stopping experience collection (33700 times) [2024-03-29 19:14:25,535][00476] Signal inference workers to resume experience collection... (33700 times) [2024-03-29 19:14:25,536][00497] InferenceWorker_p0-w0: resuming experience collection (33700 times) [2024-03-29 19:14:26,186][00497] Updated weights for policy 0, policy_version 64908 (0.0020) [2024-03-29 19:14:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41821.1). Total num frames: 1063600128. Throughput: 0: 41275.5. Samples: 945796160. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:28,841][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 19:14:29,207][00497] Updated weights for policy 0, policy_version 64918 (0.0023) [2024-03-29 19:14:33,651][00497] Updated weights for policy 0, policy_version 64928 (0.0024) [2024-03-29 19:14:33,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 1063780352. Throughput: 0: 41544.5. Samples: 945930540. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 19:14:33,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 19:14:37,785][00497] Updated weights for policy 0, policy_version 64938 (0.0022) [2024-03-29 19:14:38,839][00126] Fps is (10 sec: 36045.4, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 1063960576. Throughput: 0: 41363.2. Samples: 946185660. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:14:38,840][00126] Avg episode reward: [(0, '0.581')] [2024-03-29 19:14:41,909][00497] Updated weights for policy 0, policy_version 64948 (0.0021) [2024-03-29 19:14:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 1064206336. Throughput: 0: 41652.5. Samples: 946426660. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:14:43,840][00126] Avg episode reward: [(0, '0.588')] [2024-03-29 19:14:45,009][00497] Updated weights for policy 0, policy_version 64958 (0.0028) [2024-03-29 19:14:48,839][00126] Fps is (10 sec: 45874.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 1064419328. Throughput: 0: 41530.1. Samples: 946552520. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:14:48,840][00126] Avg episode reward: [(0, '0.490')] [2024-03-29 19:14:49,053][00497] Updated weights for policy 0, policy_version 64968 (0.0026) [2024-03-29 19:14:53,475][00497] Updated weights for policy 0, policy_version 64978 (0.0018) [2024-03-29 19:14:53,839][00126] Fps is (10 sec: 39320.8, 60 sec: 41506.1, 300 sec: 41709.7). Total num frames: 1064599552. Throughput: 0: 41355.5. Samples: 946803760. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:14:53,840][00126] Avg episode reward: [(0, '0.556')] [2024-03-29 19:14:57,481][00497] Updated weights for policy 0, policy_version 64988 (0.0034) [2024-03-29 19:14:57,757][00476] Signal inference workers to stop experience collection... (33750 times) [2024-03-29 19:14:57,792][00497] InferenceWorker_p0-w0: stopping experience collection (33750 times) [2024-03-29 19:14:57,978][00476] Signal inference workers to resume experience collection... (33750 times) [2024-03-29 19:14:57,978][00497] InferenceWorker_p0-w0: resuming experience collection (33750 times) [2024-03-29 19:14:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 1064828928. Throughput: 0: 42261.3. Samples: 947060600. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:14:58,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 19:15:00,446][00497] Updated weights for policy 0, policy_version 64998 (0.0029) [2024-03-29 19:15:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.1, 300 sec: 41765.3). Total num frames: 1065058304. Throughput: 0: 41817.7. Samples: 947179940. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:15:03,841][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 19:15:04,037][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000065007_1065074688.pth... [2024-03-29 19:15:04,335][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000064394_1055031296.pth [2024-03-29 19:15:04,661][00497] Updated weights for policy 0, policy_version 65008 (0.0026) [2024-03-29 19:15:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 1065238528. Throughput: 0: 41664.6. Samples: 947430800. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:15:08,840][00126] Avg episode reward: [(0, '0.619')] [2024-03-29 19:15:09,042][00497] Updated weights for policy 0, policy_version 65018 (0.0022) [2024-03-29 19:15:13,043][00497] Updated weights for policy 0, policy_version 65028 (0.0022) [2024-03-29 19:15:13,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 1065451520. Throughput: 0: 42334.7. Samples: 947701220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 19:15:13,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 19:15:16,047][00497] Updated weights for policy 0, policy_version 65038 (0.0023) [2024-03-29 19:15:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 1065664512. Throughput: 0: 41577.8. Samples: 947801540. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:18,840][00126] Avg episode reward: [(0, '0.589')] [2024-03-29 19:15:20,312][00497] Updated weights for policy 0, policy_version 65048 (0.0024) [2024-03-29 19:15:23,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 1065877504. Throughput: 0: 41663.6. Samples: 948060520. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:23,840][00126] Avg episode reward: [(0, '0.543')] [2024-03-29 19:15:24,967][00497] Updated weights for policy 0, policy_version 65058 (0.0018) [2024-03-29 19:15:28,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.2, 300 sec: 41654.3). Total num frames: 1066057728. Throughput: 0: 42142.3. Samples: 948323060. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:28,841][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 19:15:28,973][00497] Updated weights for policy 0, policy_version 65068 (0.0023) [2024-03-29 19:15:30,399][00476] Signal inference workers to stop experience collection... (33800 times) [2024-03-29 19:15:30,479][00476] Signal inference workers to resume experience collection... (33800 times) [2024-03-29 19:15:30,481][00497] InferenceWorker_p0-w0: stopping experience collection (33800 times) [2024-03-29 19:15:30,512][00497] InferenceWorker_p0-w0: resuming experience collection (33800 times) [2024-03-29 19:15:31,972][00497] Updated weights for policy 0, policy_version 65078 (0.0030) [2024-03-29 19:15:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1066287104. Throughput: 0: 41715.6. Samples: 948429720. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:33,840][00126] Avg episode reward: [(0, '0.566')] [2024-03-29 19:15:36,219][00497] Updated weights for policy 0, policy_version 65088 (0.0023) [2024-03-29 19:15:38,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 1066516480. Throughput: 0: 41643.7. Samples: 948677720. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:38,840][00126] Avg episode reward: [(0, '0.701')] [2024-03-29 19:15:40,924][00497] Updated weights for policy 0, policy_version 65098 (0.0027) [2024-03-29 19:15:43,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 1066680320. Throughput: 0: 42188.1. Samples: 948959060. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:43,840][00126] Avg episode reward: [(0, '0.630')] [2024-03-29 19:15:44,744][00497] Updated weights for policy 0, policy_version 65108 (0.0027) [2024-03-29 19:15:47,955][00497] Updated weights for policy 0, policy_version 65118 (0.0022) [2024-03-29 19:15:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1066909696. Throughput: 0: 41686.3. Samples: 949055820. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:48,841][00126] Avg episode reward: [(0, '0.586')] [2024-03-29 19:15:51,822][00497] Updated weights for policy 0, policy_version 65128 (0.0017) [2024-03-29 19:15:53,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.5, 300 sec: 41820.9). Total num frames: 1067139072. Throughput: 0: 41690.7. Samples: 949306880. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 19:15:53,840][00126] Avg episode reward: [(0, '0.542')] [2024-03-29 19:15:56,542][00497] Updated weights for policy 0, policy_version 65138 (0.0020) [2024-03-29 19:15:58,839][00126] Fps is (10 sec: 37683.5, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 1067286528. Throughput: 0: 41656.9. Samples: 949575780. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:15:58,840][00126] Avg episode reward: [(0, '0.605')] [2024-03-29 19:16:00,395][00497] Updated weights for policy 0, policy_version 65148 (0.0024) [2024-03-29 19:16:03,726][00497] Updated weights for policy 0, policy_version 65158 (0.0021) [2024-03-29 19:16:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 1067548672. Throughput: 0: 41849.7. Samples: 949684780. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:03,840][00126] Avg episode reward: [(0, '0.530')] [2024-03-29 19:16:05,303][00476] Signal inference workers to stop experience collection... (33850 times) [2024-03-29 19:16:05,358][00497] InferenceWorker_p0-w0: stopping experience collection (33850 times) [2024-03-29 19:16:05,389][00476] Signal inference workers to resume experience collection... (33850 times) [2024-03-29 19:16:05,392][00497] InferenceWorker_p0-w0: resuming experience collection (33850 times) [2024-03-29 19:16:07,916][00497] Updated weights for policy 0, policy_version 65168 (0.0031) [2024-03-29 19:16:08,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1067745280. Throughput: 0: 41185.3. Samples: 949913860. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:08,840][00126] Avg episode reward: [(0, '0.629')] [2024-03-29 19:16:12,460][00497] Updated weights for policy 0, policy_version 65178 (0.0018) [2024-03-29 19:16:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 1067925504. Throughput: 0: 41330.1. Samples: 950182920. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:13,840][00126] Avg episode reward: [(0, '0.576')] [2024-03-29 19:16:16,420][00497] Updated weights for policy 0, policy_version 65188 (0.0027) [2024-03-29 19:16:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1068171264. Throughput: 0: 41856.1. Samples: 950313240. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:18,840][00126] Avg episode reward: [(0, '0.658')] [2024-03-29 19:16:19,543][00497] Updated weights for policy 0, policy_version 65198 (0.0022) [2024-03-29 19:16:23,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 1068351488. Throughput: 0: 41302.3. Samples: 950536320. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:23,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 19:16:23,857][00497] Updated weights for policy 0, policy_version 65208 (0.0020) [2024-03-29 19:16:28,331][00497] Updated weights for policy 0, policy_version 65218 (0.0032) [2024-03-29 19:16:28,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 1068548096. Throughput: 0: 41043.0. Samples: 950806000. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:28,840][00126] Avg episode reward: [(0, '0.523')] [2024-03-29 19:16:32,098][00497] Updated weights for policy 0, policy_version 65228 (0.0022) [2024-03-29 19:16:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 1068777472. Throughput: 0: 41857.5. Samples: 950939400. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:33,841][00126] Avg episode reward: [(0, '0.602')] [2024-03-29 19:16:35,456][00497] Updated weights for policy 0, policy_version 65238 (0.0026) [2024-03-29 19:16:38,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 1068990464. Throughput: 0: 41412.9. Samples: 951170460. Policy #0 lag: (min: 0.0, avg: 18.2, max: 41.0) [2024-03-29 19:16:38,840][00126] Avg episode reward: [(0, '0.550')] [2024-03-29 19:16:38,986][00476] Signal inference workers to stop experience collection... (33900 times) [2024-03-29 19:16:39,056][00497] InferenceWorker_p0-w0: stopping experience collection (33900 times) [2024-03-29 19:16:39,062][00476] Signal inference workers to resume experience collection... (33900 times) [2024-03-29 19:16:39,083][00497] InferenceWorker_p0-w0: resuming experience collection (33900 times) [2024-03-29 19:16:39,364][00497] Updated weights for policy 0, policy_version 65248 (0.0027) [2024-03-29 19:16:43,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 1069170688. Throughput: 0: 41172.3. Samples: 951428540. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:16:43,841][00126] Avg episode reward: [(0, '0.626')] [2024-03-29 19:16:44,031][00497] Updated weights for policy 0, policy_version 65258 (0.0020) [2024-03-29 19:16:48,079][00497] Updated weights for policy 0, policy_version 65268 (0.0027) [2024-03-29 19:16:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 1069383680. Throughput: 0: 41601.3. Samples: 951556840. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:16:48,840][00126] Avg episode reward: [(0, '0.670')] [2024-03-29 19:16:51,429][00497] Updated weights for policy 0, policy_version 65278 (0.0026) [2024-03-29 19:16:53,839][00126] Fps is (10 sec: 45875.6, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 1069629440. Throughput: 0: 41552.9. Samples: 951783740. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:16:53,840][00126] Avg episode reward: [(0, '0.482')] [2024-03-29 19:16:55,455][00497] Updated weights for policy 0, policy_version 65288 (0.0019) [2024-03-29 19:16:58,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 1069809664. Throughput: 0: 41336.5. Samples: 952043060. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:16:58,840][00126] Avg episode reward: [(0, '0.573')] [2024-03-29 19:16:59,576][00497] Updated weights for policy 0, policy_version 65298 (0.0019) [2024-03-29 19:17:03,518][00497] Updated weights for policy 0, policy_version 65308 (0.0027) [2024-03-29 19:17:03,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 1070022656. Throughput: 0: 41631.1. Samples: 952186640. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:17:03,840][00126] Avg episode reward: [(0, '0.557')] [2024-03-29 19:17:04,317][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000065311_1070055424.pth... [2024-03-29 19:17:04,622][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000064699_1060028416.pth [2024-03-29 19:17:06,977][00497] Updated weights for policy 0, policy_version 65318 (0.0030) [2024-03-29 19:17:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 1070252032. Throughput: 0: 41788.7. Samples: 952416820. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:17:08,840][00126] Avg episode reward: [(0, '0.617')] [2024-03-29 19:17:10,784][00497] Updated weights for policy 0, policy_version 65328 (0.0025) [2024-03-29 19:17:13,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1070448640. Throughput: 0: 41420.4. Samples: 952669920. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:17:13,840][00126] Avg episode reward: [(0, '0.624')] [2024-03-29 19:17:15,256][00497] Updated weights for policy 0, policy_version 65338 (0.0027) [2024-03-29 19:17:16,373][00476] Signal inference workers to stop experience collection... (33950 times) [2024-03-29 19:17:16,399][00497] InferenceWorker_p0-w0: stopping experience collection (33950 times) [2024-03-29 19:17:16,554][00476] Signal inference workers to resume experience collection... (33950 times) [2024-03-29 19:17:16,555][00497] InferenceWorker_p0-w0: resuming experience collection (33950 times) [2024-03-29 19:17:18,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 1070628864. Throughput: 0: 41437.3. Samples: 952804080. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 19:17:18,840][00126] Avg episode reward: [(0, '0.674')] [2024-03-29 19:17:19,341][00497] Updated weights for policy 0, policy_version 65348 (0.0022) [2024-03-29 19:17:22,518][00497] Updated weights for policy 0, policy_version 65358 (0.0020) [2024-03-29 19:17:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 1070858240. Throughput: 0: 41523.4. Samples: 953039020. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:23,840][00126] Avg episode reward: [(0, '0.628')] [2024-03-29 19:17:26,652][00497] Updated weights for policy 0, policy_version 65368 (0.0027) [2024-03-29 19:17:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1071071232. Throughput: 0: 41381.4. Samples: 953290700. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:28,840][00126] Avg episode reward: [(0, '0.541')] [2024-03-29 19:17:31,155][00497] Updated weights for policy 0, policy_version 65378 (0.0025) [2024-03-29 19:17:33,840][00126] Fps is (10 sec: 39321.3, 60 sec: 41232.9, 300 sec: 41487.6). Total num frames: 1071251456. Throughput: 0: 41555.4. Samples: 953426840. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:33,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 19:17:35,256][00497] Updated weights for policy 0, policy_version 65388 (0.0024) [2024-03-29 19:17:38,332][00497] Updated weights for policy 0, policy_version 65398 (0.0031) [2024-03-29 19:17:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 1071497216. Throughput: 0: 41966.2. Samples: 953672220. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:38,840][00126] Avg episode reward: [(0, '0.649')] [2024-03-29 19:17:42,477][00497] Updated weights for policy 0, policy_version 65408 (0.0024) [2024-03-29 19:17:43,839][00126] Fps is (10 sec: 44238.0, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 1071693824. Throughput: 0: 41404.5. Samples: 953906260. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:43,840][00126] Avg episode reward: [(0, '0.595')] [2024-03-29 19:17:46,953][00497] Updated weights for policy 0, policy_version 65418 (0.0017) [2024-03-29 19:17:48,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 1071874048. Throughput: 0: 41409.8. Samples: 954050080. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:48,840][00126] Avg episode reward: [(0, '0.558')] [2024-03-29 19:17:50,959][00497] Updated weights for policy 0, policy_version 65428 (0.0023) [2024-03-29 19:17:50,981][00476] Signal inference workers to stop experience collection... (34000 times) [2024-03-29 19:17:50,982][00476] Signal inference workers to resume experience collection... (34000 times) [2024-03-29 19:17:51,027][00497] InferenceWorker_p0-w0: stopping experience collection (34000 times) [2024-03-29 19:17:51,027][00497] InferenceWorker_p0-w0: resuming experience collection (34000 times) [2024-03-29 19:17:53,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 1072119808. Throughput: 0: 41840.9. Samples: 954299660. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:53,840][00126] Avg episode reward: [(0, '0.608')] [2024-03-29 19:17:54,092][00497] Updated weights for policy 0, policy_version 65438 (0.0032) [2024-03-29 19:17:57,920][00497] Updated weights for policy 0, policy_version 65448 (0.0025) [2024-03-29 19:17:58,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1072332800. Throughput: 0: 41649.8. Samples: 954544160. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 19:17:58,840][00126] Avg episode reward: [(0, '0.544')] [2024-03-29 19:18:02,385][00497] Updated weights for policy 0, policy_version 65458 (0.0020) [2024-03-29 19:18:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 1072529408. Throughput: 0: 41762.2. Samples: 954683380. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 19:18:03,840][00126] Avg episode reward: [(0, '0.650')] [2024-03-29 19:18:06,286][00497] Updated weights for policy 0, policy_version 65468 (0.0023) [2024-03-29 19:18:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 1072758784. Throughput: 0: 42345.1. Samples: 954944540. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 19:18:08,840][00126] Avg episode reward: [(0, '0.548')] [2024-03-29 19:18:09,586][00497] Updated weights for policy 0, policy_version 65478 (0.0024) [2024-03-29 19:18:13,459][00497] Updated weights for policy 0, policy_version 65488 (0.0023) [2024-03-29 19:18:13,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 1072955392. Throughput: 0: 41523.6. Samples: 955159260. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 19:18:13,840][00126] Avg episode reward: [(0, '0.611')] [2024-03-29 19:18:18,134][00497] Updated weights for policy 0, policy_version 65498 (0.0021) [2024-03-29 19:18:18,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 1073135616. Throughput: 0: 41511.4. Samples: 955294840. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 19:18:18,840][00126] Avg episode reward: [(0, '0.585')] [2024-03-29 19:18:20,678][00476] Signal inference workers to stop experience collection... (34050 times) [2024-03-29 19:18:20,713][00497] InferenceWorker_p0-w0: stopping experience collection (34050 times) [2024-03-29 19:18:20,895][00476] Signal inference workers to resume experience collection... (34050 times) [2024-03-29 19:18:20,895][00497] InferenceWorker_p0-w0: resuming experience collection (34050 times) [2024-03-29 19:18:22,290][00497] Updated weights for policy 0, policy_version 65508 (0.0022) [2024-03-29 19:18:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 1073348608. Throughput: 0: 41929.3. Samples: 955559040. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 19:18:23,840][00126] Avg episode reward: [(0, '0.535')] [2024-03-29 19:18:25,391][00497] Updated weights for policy 0, policy_version 65518 (0.0029) [2024-03-29 19:18:28,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 1073577984. Throughput: 0: 41620.8. Samples: 955779200. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 19:18:28,840][00126] Avg episode reward: [(0, '0.563')] [2024-03-29 19:18:29,465][00497] Updated weights for policy 0, policy_version 65528 (0.0025)