[2024-03-29 12:00:56,219][00126] Saving configuration to /workspace/metta/train_dir/b.a20.20x20_40x40.norm/config.json... [2024-03-29 12:00:56,351][00126] Rollout worker 0 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 1 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 2 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 3 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 4 uses device cpu [2024-03-29 12:00:56,352][00126] Rollout worker 5 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 6 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 7 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 8 uses device cpu [2024-03-29 12:00:56,353][00126] Rollout worker 9 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 10 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 11 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 12 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 13 uses device cpu [2024-03-29 12:00:56,354][00126] Rollout worker 14 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 15 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 16 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 17 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 18 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 19 uses device cpu [2024-03-29 12:00:56,355][00126] Rollout worker 20 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 21 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 22 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 23 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 24 uses device cpu [2024-03-29 12:00:56,356][00126] Rollout worker 25 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 26 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 27 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 28 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 29 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 30 uses device cpu [2024-03-29 12:00:56,357][00126] Rollout worker 31 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 32 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 33 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 34 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 35 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 36 uses device cpu [2024-03-29 12:00:56,358][00126] Rollout worker 37 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 38 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 39 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 40 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 41 uses device cpu [2024-03-29 12:00:56,359][00126] Rollout worker 42 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 43 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 44 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 45 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 46 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 47 uses device cpu [2024-03-29 12:00:56,360][00126] Rollout worker 48 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 49 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 50 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 51 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 52 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 53 uses device cpu [2024-03-29 12:00:56,361][00126] Rollout worker 54 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 55 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 56 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 57 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 58 uses device cpu [2024-03-29 12:00:56,362][00126] Rollout worker 59 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 60 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 61 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 62 uses device cpu [2024-03-29 12:00:56,363][00126] Rollout worker 63 uses device cpu [2024-03-29 12:00:58,093][00126] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:00:58,094][00126] InferenceWorker_p0-w0: min num requests: 21 [2024-03-29 12:00:58,197][00126] Starting all processes... [2024-03-29 12:00:58,197][00126] Starting process learner_proc0 [2024-03-29 12:00:58,403][00126] Starting all processes... [2024-03-29 12:00:58,410][00126] Starting process inference_proc0-0 [2024-03-29 12:00:58,410][00126] Starting process rollout_proc1 [2024-03-29 12:00:58,410][00126] Starting process rollout_proc3 [2024-03-29 12:00:58,411][00126] Starting process rollout_proc5 [2024-03-29 12:00:58,413][00126] Starting process rollout_proc7 [2024-03-29 12:00:58,413][00126] Starting process rollout_proc9 [2024-03-29 12:00:58,419][00126] Starting process rollout_proc11 [2024-03-29 12:00:58,421][00126] Starting process rollout_proc0 [2024-03-29 12:00:58,429][00126] Starting process rollout_proc2 [2024-03-29 12:00:58,429][00126] Starting process rollout_proc4 [2024-03-29 12:00:58,429][00126] Starting process rollout_proc13 [2024-03-29 12:00:58,430][00126] Starting process rollout_proc15 [2024-03-29 12:00:58,430][00126] Starting process rollout_proc17 [2024-03-29 12:00:58,430][00126] Starting process rollout_proc19 [2024-03-29 12:00:58,432][00126] Starting process rollout_proc21 [2024-03-29 12:00:58,438][00126] Starting process rollout_proc23 [2024-03-29 12:00:58,451][00126] Starting process rollout_proc6 [2024-03-29 12:00:58,452][00126] Starting process rollout_proc10 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc25 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc27 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc8 [2024-03-29 12:00:58,454][00126] Starting process rollout_proc29 [2024-03-29 12:00:58,498][00126] Starting process rollout_proc31 [2024-03-29 12:00:58,535][00126] Starting process rollout_proc33 [2024-03-29 12:00:58,549][00126] Starting process rollout_proc35 [2024-03-29 12:00:58,549][00126] Starting process rollout_proc12 [2024-03-29 12:00:58,550][00126] Starting process rollout_proc14 [2024-03-29 12:00:58,556][00126] Starting process rollout_proc16 [2024-03-29 12:00:58,564][00126] Starting process rollout_proc18 [2024-03-29 12:00:58,577][00126] Starting process rollout_proc20 [2024-03-29 12:00:58,584][00126] Starting process rollout_proc22 [2024-03-29 12:00:58,591][00126] Starting process rollout_proc24 [2024-03-29 12:00:58,599][00126] Starting process rollout_proc37 [2024-03-29 12:00:58,617][00126] Starting process rollout_proc39 [2024-03-29 12:00:58,617][00126] Starting process rollout_proc26 [2024-03-29 12:00:58,672][00126] Starting process rollout_proc41 [2024-03-29 12:00:58,693][00126] Starting process rollout_proc28 [2024-03-29 12:00:58,693][00126] Starting process rollout_proc30 [2024-03-29 12:00:58,704][00126] Starting process rollout_proc32 [2024-03-29 12:00:58,735][00126] Starting process rollout_proc34 [2024-03-29 12:00:58,730][00126] Starting process rollout_proc43 [2024-03-29 12:00:58,730][00126] Starting process rollout_proc36 [2024-03-29 12:00:58,807][00126] Starting process rollout_proc45 [2024-03-29 12:00:58,807][00126] Starting process rollout_proc47 [2024-03-29 12:00:58,809][00126] Starting process rollout_proc49 [2024-03-29 12:00:58,873][00126] Starting process rollout_proc46 [2024-03-29 12:00:58,885][00126] Starting process rollout_proc51 [2024-03-29 12:00:58,897][00126] Starting process rollout_proc38 [2024-03-29 12:00:58,900][00126] Starting process rollout_proc40 [2024-03-29 12:00:58,906][00126] Starting process rollout_proc53 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc55 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc57 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc48 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc54 [2024-03-29 12:00:59,072][00126] Starting process rollout_proc42 [2024-03-29 12:00:59,097][00126] Starting process rollout_proc59 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc52 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc50 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc61 [2024-03-29 12:00:59,214][00126] Starting process rollout_proc63 [2024-03-29 12:00:59,257][00126] Starting process rollout_proc58 [2024-03-29 12:00:59,257][00126] Starting process rollout_proc56 [2024-03-29 12:00:59,262][00126] Starting process rollout_proc44 [2024-03-29 12:00:59,344][00126] Starting process rollout_proc60 [2024-03-29 12:00:59,379][00126] Starting process rollout_proc62 [2024-03-29 12:01:03,344][00503] Worker 3 uses CPU cores [3] [2024-03-29 12:01:03,364][00481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:03,364][00481] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-03-29 12:01:03,389][00481] Num visible devices: 1 [2024-03-29 12:01:03,441][00502] Worker 1 uses CPU cores [1] [2024-03-29 12:01:03,448][00481] Starting seed is not provided [2024-03-29 12:01:03,452][00481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:03,452][00481] Initializing actor-critic model on device cuda:0 [2024-03-29 12:01:03,453][00481] RunningMeanStd input shape: (20,) [2024-03-29 12:01:03,459][00481] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:01:03,459][00481] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:01:03,459][00481] RunningMeanStd input shape: (2,) [2024-03-29 12:01:03,460][00481] RunningMeanStd input shape: (1,) [2024-03-29 12:01:03,460][00481] RunningMeanStd input shape: (1,) [2024-03-29 12:01:03,460][00501] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:03,461][00501] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-03-29 12:01:03,461][00505] Worker 9 uses CPU cores [9] [2024-03-29 12:01:03,478][00501] Num visible devices: 1 [2024-03-29 12:01:03,497][00888] Worker 4 uses CPU cores [4] [2024-03-29 12:01:03,511][00889] Worker 13 uses CPU cores [13] [2024-03-29 12:01:03,531][00891] Worker 17 uses CPU cores [17] [2024-03-29 12:01:03,531][01024] Worker 27 uses CPU cores [27] [2024-03-29 12:01:03,541][00887] Worker 2 uses CPU cores [2] [2024-03-29 12:01:03,585][00569] Worker 7 uses CPU cores [7] [2024-03-29 12:01:03,609][00589] Worker 11 uses CPU cores [11] [2024-03-29 12:01:03,609][01471] Worker 12 uses CPU cores [12] [2024-03-29 12:01:03,609][00890] Worker 15 uses CPU cores [15] [2024-03-29 12:01:03,624][00997] Worker 23 uses CPU cores [23] [2024-03-29 12:01:03,639][01004] Worker 25 uses CPU cores [25] [2024-03-29 12:01:03,641][01025] Worker 8 uses CPU cores [8] [2024-03-29 12:01:03,669][01023] Worker 10 uses CPU cores [10] [2024-03-29 12:01:03,677][00892] Worker 19 uses CPU cores [19] [2024-03-29 12:01:03,677][00996] Worker 21 uses CPU cores [21] [2024-03-29 12:01:03,681][01856] Worker 39 uses CPU cores [39] [2024-03-29 12:01:03,681][01003] Worker 6 uses CPU cores [6] [2024-03-29 12:01:03,693][01279] Worker 31 uses CPU cores [31] [2024-03-29 12:01:03,697][02370] Worker 51 uses CPU cores [51] [2024-03-29 12:01:03,711][01790] Worker 45 uses CPU cores [45] [2024-03-29 12:01:03,711][01662] Worker 16 uses CPU cores [16] [2024-03-29 12:01:03,713][01855] Worker 37 uses CPU cores [37] [2024-03-29 12:01:03,713][02358] Worker 28 uses CPU cores [28] [2024-03-29 12:01:03,766][01535] Worker 14 uses CPU cores [14] [2024-03-29 12:01:03,769][02356] Worker 49 uses CPU cores [49] [2024-03-29 12:01:03,776][03302] Worker 59 uses CPU cores [59] [2024-03-29 12:01:03,785][02239] Worker 46 uses CPU cores [46] [2024-03-29 12:01:03,786][00504] Worker 5 uses CPU cores [5] [2024-03-29 12:01:03,789][02053] Worker 41 uses CPU cores [41] [2024-03-29 12:01:03,796][02371] Worker 47 uses CPU cores [47] [2024-03-29 12:01:03,802][03373] Worker 52 uses CPU cores [52] [2024-03-29 12:01:03,803][02500] Worker 55 uses CPU cores [55] [2024-03-29 12:01:03,813][03572] Worker 61 uses CPU cores [61] [2024-03-29 12:01:03,813][01854] Worker 18 uses CPU cores [18] [2024-03-29 12:01:03,821][02357] Worker 22 uses CPU cores [22] [2024-03-29 12:01:03,830][01789] Worker 20 uses CPU cores [20] [2024-03-29 12:01:03,830][02111] Worker 53 uses CPU cores [53] [2024-03-29 12:01:03,830][03070] Worker 54 uses CPU cores [54] [2024-03-29 12:01:03,836][01920] Worker 26 uses CPU cores [26] [2024-03-29 12:01:03,837][02369] Worker 36 uses CPU cores [36] [2024-03-29 12:01:03,851][01294] Worker 35 uses CPU cores [35] [2024-03-29 12:01:03,851][02754] Worker 40 uses CPU cores [40] [2024-03-29 12:01:03,859][02884] Worker 43 uses CPU cores [43] [2024-03-29 12:01:03,861][03578] Worker 63 uses CPU cores [63] [2024-03-29 12:01:03,865][02436] Worker 38 uses CPU cores [38] [2024-03-29 12:01:03,865][02372] Worker 30 uses CPU cores [30] [2024-03-29 12:01:03,879][02883] Worker 48 uses CPU cores [48] [2024-03-29 12:01:03,902][02819] Worker 57 uses CPU cores [57] [2024-03-29 12:01:03,905][01259] Worker 29 uses CPU cores [29] [2024-03-29 12:01:03,926][00570] Worker 0 uses CPU cores [0] [2024-03-29 12:01:03,948][03258] Worker 42 uses CPU cores [42] [2024-03-29 12:01:03,968][02818] Worker 32 uses CPU cores [32] [2024-03-29 12:01:03,968][03713] Worker 44 uses CPU cores [44] [2024-03-29 12:01:03,993][03585] Worker 58 uses CPU cores [58] [2024-03-29 12:01:03,993][03445] Worker 50 uses CPU cores [50] [2024-03-29 12:01:04,001][02188] Worker 24 uses CPU cores [24] [2024-03-29 12:01:04,037][01280] Worker 33 uses CPU cores [33] [2024-03-29 12:01:04,068][02690] Worker 34 uses CPU cores [34] [2024-03-29 12:01:04,068][03725] Worker 60 uses CPU cores [60] [2024-03-29 12:01:04,091][03841] Worker 62 uses CPU cores [62] [2024-03-29 12:01:04,110][03586] Worker 56 uses CPU cores [56] [2024-03-29 12:01:04,204][00481] Created Actor Critic model with architecture: [2024-03-29 12:01:04,204][00481] PredictingActorCritic( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (global_vars): RunningMeanStdInPlace() (griddly_obs): RunningMeanStdInPlace() (kinship): RunningMeanStdInPlace() (last_action): RunningMeanStdInPlace() (last_reward): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): ObjectEmeddingAgentEncoder( (object_embedding): Sequential( (0): Linear(in_features=52, out_features=64, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) ) (encoder_head): Sequential( (0): Linear(in_features=7767, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): ObjectEmeddingAgentDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=17, bias=True) ) ) [2024-03-29 12:01:05,002][00481] Using optimizer [2024-03-29 12:01:05,527][00481] No checkpoints found [2024-03-29 12:01:05,527][00481] Did not load from checkpoint, starting from scratch! [2024-03-29 12:01:05,527][00481] Initialized policy 0 weights for model version 0 [2024-03-29 12:01:05,529][00481] LearnerWorker_p0 finished initialization! [2024-03-29 12:01:05,530][00481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:01:05,716][00501] RunningMeanStd input shape: (20,) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (2,) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (1,) [2024-03-29 12:01:05,717][00501] RunningMeanStd input shape: (1,) [2024-03-29 12:01:06,339][00126] Inference worker 0-0 is ready! [2024-03-29 12:01:06,340][00126] All inference workers are ready! Signal rollout workers to start! [2024-03-29 12:01:06,686][00126] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:07,274][01535] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,275][00891] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,279][02436] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,285][00589] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,285][02754] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,294][00505] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,298][00888] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,299][02372] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,301][02370] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,303][01280] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,311][03585] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,314][02884] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,314][02690] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,314][03578] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,327][00502] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,329][00997] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,330][03445] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,331][00892] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,332][01279] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,332][00889] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,333][02369] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,335][02371] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,340][00504] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,350][00503] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,351][03586] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,353][01023] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,357][02111] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,359][02883] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,362][01855] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,365][00890] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,368][02239] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,369][00996] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,370][01471] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,371][00569] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,373][02500] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,376][03070] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,378][02819] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,379][02357] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,394][02356] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,394][01920] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,394][00570] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,395][01790] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,396][01789] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,400][02818] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,402][01294] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,402][01004] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,402][03258] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,407][03713] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,408][03302] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,419][01856] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,420][03373] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,421][03725] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,422][01025] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,423][01024] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,423][02053] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,426][03841] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,428][01003] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,431][00887] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,434][03572] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,436][02188] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,437][02358] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,447][01662] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,448][01854] Decorrelating experience for 0 frames... [2024-03-29 12:01:07,449][01259] Decorrelating experience for 0 frames... [2024-03-29 12:01:08,203][01535] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,209][03445] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,211][00888] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,216][01280] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,222][00892] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,228][02754] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,231][02690] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,231][01279] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,233][00891] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,234][00997] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,235][02370] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,238][00502] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,238][02371] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,242][02369] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,243][02372] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,248][00503] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,255][00589] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,256][01023] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,262][00505] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,263][00889] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,266][00504] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,267][02436] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,267][00890] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,274][00569] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,276][02884] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,278][03585] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,279][02500] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,281][02357] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,290][03586] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,290][03578] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,296][01003] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,302][02819] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,306][02356] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,314][01920] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,316][02883] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,323][02053] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,325][02239] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,332][03070] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,336][01855] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,337][01856] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,338][01790] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,340][03258] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,344][03373] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,344][00996] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,353][02111] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,354][00570] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,359][03725] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,359][02818] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,365][03841] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,369][01789] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,371][01854] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,372][01294] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,377][01025] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,384][01004] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,385][00887] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,385][01259] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,387][02188] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,387][03572] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,390][01024] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,391][01662] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,398][03713] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,400][01471] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,400][03302] Decorrelating experience for 256 frames... [2024-03-29 12:01:08,447][02358] Decorrelating experience for 256 frames... [2024-03-29 12:01:11,685][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:16,685][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 34048.4. Samples: 340480. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:18,090][00126] Heartbeat connected on Batcher_0 [2024-03-29 12:01:18,091][00126] Heartbeat connected on LearnerWorker_p0 [2024-03-29 12:01:18,098][00126] Heartbeat connected on RolloutWorker_w1 [2024-03-29 12:01:18,099][00126] Heartbeat connected on RolloutWorker_w2 [2024-03-29 12:01:18,100][00126] Heartbeat connected on RolloutWorker_w0 [2024-03-29 12:01:18,101][00126] Heartbeat connected on RolloutWorker_w3 [2024-03-29 12:01:18,102][00126] Heartbeat connected on RolloutWorker_w4 [2024-03-29 12:01:18,110][00126] Heartbeat connected on RolloutWorker_w9 [2024-03-29 12:01:18,111][00126] Heartbeat connected on RolloutWorker_w5 [2024-03-29 12:01:18,112][00126] Heartbeat connected on RolloutWorker_w10 [2024-03-29 12:01:18,112][00126] Heartbeat connected on RolloutWorker_w7 [2024-03-29 12:01:18,113][00126] Heartbeat connected on RolloutWorker_w6 [2024-03-29 12:01:18,113][00126] Heartbeat connected on RolloutWorker_w11 [2024-03-29 12:01:18,115][00126] Heartbeat connected on RolloutWorker_w12 [2024-03-29 12:01:18,118][00126] Heartbeat connected on RolloutWorker_w13 [2024-03-29 12:01:18,118][00126] Heartbeat connected on RolloutWorker_w14 [2024-03-29 12:01:18,120][00126] Heartbeat connected on RolloutWorker_w15 [2024-03-29 12:01:18,121][00126] Heartbeat connected on RolloutWorker_w8 [2024-03-29 12:01:18,121][00126] Heartbeat connected on RolloutWorker_w16 [2024-03-29 12:01:18,123][00126] Heartbeat connected on RolloutWorker_w17 [2024-03-29 12:01:18,124][00126] Heartbeat connected on RolloutWorker_w18 [2024-03-29 12:01:18,125][00126] Heartbeat connected on InferenceWorker_p0-w0 [2024-03-29 12:01:18,127][00126] Heartbeat connected on RolloutWorker_w20 [2024-03-29 12:01:18,130][00126] Heartbeat connected on RolloutWorker_w22 [2024-03-29 12:01:18,132][00126] Heartbeat connected on RolloutWorker_w23 [2024-03-29 12:01:18,132][00126] Heartbeat connected on RolloutWorker_w19 [2024-03-29 12:01:18,133][00126] Heartbeat connected on RolloutWorker_w24 [2024-03-29 12:01:18,135][00126] Heartbeat connected on RolloutWorker_w25 [2024-03-29 12:01:18,136][00126] Heartbeat connected on RolloutWorker_w26 [2024-03-29 12:01:18,137][00126] Heartbeat connected on RolloutWorker_w21 [2024-03-29 12:01:18,138][00126] Heartbeat connected on RolloutWorker_w27 [2024-03-29 12:01:18,140][00126] Heartbeat connected on RolloutWorker_w28 [2024-03-29 12:01:18,142][00126] Heartbeat connected on RolloutWorker_w29 [2024-03-29 12:01:18,143][00126] Heartbeat connected on RolloutWorker_w30 [2024-03-29 12:01:18,145][00126] Heartbeat connected on RolloutWorker_w31 [2024-03-29 12:01:18,146][00126] Heartbeat connected on RolloutWorker_w32 [2024-03-29 12:01:18,153][00126] Heartbeat connected on RolloutWorker_w36 [2024-03-29 12:01:18,153][00126] Heartbeat connected on RolloutWorker_w35 [2024-03-29 12:01:18,155][00126] Heartbeat connected on RolloutWorker_w37 [2024-03-29 12:01:18,155][00126] Heartbeat connected on RolloutWorker_w34 [2024-03-29 12:01:18,155][00126] Heartbeat connected on RolloutWorker_w38 [2024-03-29 12:01:18,156][00126] Heartbeat connected on RolloutWorker_w33 [2024-03-29 12:01:18,157][00126] Heartbeat connected on RolloutWorker_w39 [2024-03-29 12:01:18,160][00126] Heartbeat connected on RolloutWorker_w41 [2024-03-29 12:01:18,162][00126] Heartbeat connected on RolloutWorker_w42 [2024-03-29 12:01:18,162][00126] Heartbeat connected on RolloutWorker_w40 [2024-03-29 12:01:18,163][00126] Heartbeat connected on RolloutWorker_w43 [2024-03-29 12:01:18,167][00126] Heartbeat connected on RolloutWorker_w45 [2024-03-29 12:01:18,172][00126] Heartbeat connected on RolloutWorker_w48 [2024-03-29 12:01:18,172][00126] Heartbeat connected on RolloutWorker_w46 [2024-03-29 12:01:18,173][00126] Heartbeat connected on RolloutWorker_w47 [2024-03-29 12:01:18,173][00126] Heartbeat connected on RolloutWorker_w49 [2024-03-29 12:01:18,173][00126] Heartbeat connected on RolloutWorker_w44 [2024-03-29 12:01:18,174][00126] Heartbeat connected on RolloutWorker_w50 [2024-03-29 12:01:18,176][00126] Heartbeat connected on RolloutWorker_w51 [2024-03-29 12:01:18,179][00126] Heartbeat connected on RolloutWorker_w53 [2024-03-29 12:01:18,179][00126] Heartbeat connected on RolloutWorker_w52 [2024-03-29 12:01:18,180][00126] Heartbeat connected on RolloutWorker_w54 [2024-03-29 12:01:18,182][00126] Heartbeat connected on RolloutWorker_w55 [2024-03-29 12:01:18,183][00126] Heartbeat connected on RolloutWorker_w56 [2024-03-29 12:01:18,185][00126] Heartbeat connected on RolloutWorker_w57 [2024-03-29 12:01:18,186][00126] Heartbeat connected on RolloutWorker_w58 [2024-03-29 12:01:18,194][00126] Heartbeat connected on RolloutWorker_w60 [2024-03-29 12:01:18,195][00126] Heartbeat connected on RolloutWorker_w59 [2024-03-29 12:01:18,195][00126] Heartbeat connected on RolloutWorker_w62 [2024-03-29 12:01:18,195][00126] Heartbeat connected on RolloutWorker_w61 [2024-03-29 12:01:18,196][00126] Heartbeat connected on RolloutWorker_w63 [2024-03-29 12:01:20,329][03258] Worker 42, sleep for 98.438 sec to decorrelate experience collection [2024-03-29 12:01:20,330][02883] Worker 48, sleep for 112.500 sec to decorrelate experience collection [2024-03-29 12:01:20,330][00504] Worker 5, sleep for 11.719 sec to decorrelate experience collection [2024-03-29 12:01:20,353][03302] Worker 59, sleep for 138.281 sec to decorrelate experience collection [2024-03-29 12:01:20,353][00997] Worker 23, sleep for 53.906 sec to decorrelate experience collection [2024-03-29 12:01:20,354][00888] Worker 4, sleep for 9.375 sec to decorrelate experience collection [2024-03-29 12:01:20,355][02370] Worker 51, sleep for 119.531 sec to decorrelate experience collection [2024-03-29 12:01:20,355][03445] Worker 50, sleep for 117.188 sec to decorrelate experience collection [2024-03-29 12:01:20,356][01535] Worker 14, sleep for 32.812 sec to decorrelate experience collection [2024-03-29 12:01:20,376][01024] Worker 27, sleep for 63.281 sec to decorrelate experience collection [2024-03-29 12:01:20,377][02369] Worker 36, sleep for 84.375 sec to decorrelate experience collection [2024-03-29 12:01:20,378][03586] Worker 56, sleep for 131.250 sec to decorrelate experience collection [2024-03-29 12:01:20,379][02371] Worker 47, sleep for 110.156 sec to decorrelate experience collection [2024-03-29 12:01:20,380][02884] Worker 43, sleep for 100.781 sec to decorrelate experience collection [2024-03-29 12:01:20,381][02818] Worker 32, sleep for 75.000 sec to decorrelate experience collection [2024-03-29 12:01:20,386][01023] Worker 10, sleep for 23.438 sec to decorrelate experience collection [2024-03-29 12:01:20,386][01279] Worker 31, sleep for 72.656 sec to decorrelate experience collection [2024-03-29 12:01:20,392][01855] Worker 37, sleep for 86.719 sec to decorrelate experience collection [2024-03-29 12:01:20,395][00505] Worker 9, sleep for 21.094 sec to decorrelate experience collection [2024-03-29 12:01:20,395][01920] Worker 26, sleep for 60.938 sec to decorrelate experience collection [2024-03-29 12:01:20,395][02053] Worker 41, sleep for 96.094 sec to decorrelate experience collection [2024-03-29 12:01:20,396][02690] Worker 34, sleep for 79.688 sec to decorrelate experience collection [2024-03-29 12:01:20,397][02357] Worker 22, sleep for 51.562 sec to decorrelate experience collection [2024-03-29 12:01:20,410][03725] Worker 60, sleep for 140.625 sec to decorrelate experience collection [2024-03-29 12:01:20,423][00503] Worker 3, sleep for 7.031 sec to decorrelate experience collection [2024-03-29 12:01:20,428][01790] Worker 45, sleep for 105.469 sec to decorrelate experience collection [2024-03-29 12:01:20,433][02372] Worker 30, sleep for 70.312 sec to decorrelate experience collection [2024-03-29 12:01:20,467][00996] Worker 21, sleep for 49.219 sec to decorrelate experience collection [2024-03-29 12:01:20,467][03373] Worker 52, sleep for 121.875 sec to decorrelate experience collection [2024-03-29 12:01:20,467][00891] Worker 17, sleep for 39.844 sec to decorrelate experience collection [2024-03-29 12:01:20,469][01025] Worker 8, sleep for 18.750 sec to decorrelate experience collection [2024-03-29 12:01:20,475][00890] Worker 15, sleep for 35.156 sec to decorrelate experience collection [2024-03-29 12:01:20,476][02356] Worker 49, sleep for 114.844 sec to decorrelate experience collection [2024-03-29 12:01:20,478][00569] Worker 7, sleep for 16.406 sec to decorrelate experience collection [2024-03-29 12:01:20,479][01003] Worker 6, sleep for 14.062 sec to decorrelate experience collection [2024-03-29 12:01:20,495][01294] Worker 35, sleep for 82.031 sec to decorrelate experience collection [2024-03-29 12:01:20,497][03070] Worker 54, sleep for 126.562 sec to decorrelate experience collection [2024-03-29 12:01:20,497][01662] Worker 16, sleep for 37.500 sec to decorrelate experience collection [2024-03-29 12:01:20,510][02500] Worker 55, sleep for 128.906 sec to decorrelate experience collection [2024-03-29 12:01:20,511][00889] Worker 13, sleep for 30.469 sec to decorrelate experience collection [2024-03-29 12:01:20,511][02239] Worker 46, sleep for 107.812 sec to decorrelate experience collection [2024-03-29 12:01:20,512][03572] Worker 61, sleep for 142.969 sec to decorrelate experience collection [2024-03-29 12:01:20,512][02754] Worker 40, sleep for 93.750 sec to decorrelate experience collection [2024-03-29 12:01:20,513][03585] Worker 58, sleep for 135.938 sec to decorrelate experience collection [2024-03-29 12:01:20,515][01004] Worker 25, sleep for 58.594 sec to decorrelate experience collection [2024-03-29 12:01:20,519][01856] Worker 39, sleep for 91.406 sec to decorrelate experience collection [2024-03-29 12:01:20,520][00887] Worker 2, sleep for 4.688 sec to decorrelate experience collection [2024-03-29 12:01:20,521][00481] Signal inference workers to stop experience collection... [2024-03-29 12:01:20,530][02188] Worker 24, sleep for 56.250 sec to decorrelate experience collection [2024-03-29 12:01:20,531][02819] Worker 57, sleep for 133.594 sec to decorrelate experience collection [2024-03-29 12:01:20,532][00589] Worker 11, sleep for 25.781 sec to decorrelate experience collection [2024-03-29 12:01:20,536][02436] Worker 38, sleep for 89.062 sec to decorrelate experience collection [2024-03-29 12:01:20,536][01789] Worker 20, sleep for 46.875 sec to decorrelate experience collection [2024-03-29 12:01:20,538][00501] InferenceWorker_p0-w0: stopping experience collection [2024-03-29 12:01:20,543][00892] Worker 19, sleep for 44.531 sec to decorrelate experience collection [2024-03-29 12:01:20,544][02111] Worker 53, sleep for 124.219 sec to decorrelate experience collection [2024-03-29 12:01:20,565][03841] Worker 62, sleep for 145.312 sec to decorrelate experience collection [2024-03-29 12:01:20,569][01471] Worker 12, sleep for 28.125 sec to decorrelate experience collection [2024-03-29 12:01:20,570][02358] Worker 28, sleep for 65.625 sec to decorrelate experience collection [2024-03-29 12:01:21,685][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 43623.2. Samples: 654340. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:01:22,273][00481] Signal inference workers to resume experience collection... [2024-03-29 12:01:22,273][00501] InferenceWorker_p0-w0: resuming experience collection [2024-03-29 12:01:22,346][03578] Worker 63, sleep for 147.656 sec to decorrelate experience collection [2024-03-29 12:01:22,348][03713] Worker 44, sleep for 103.125 sec to decorrelate experience collection [2024-03-29 12:01:22,381][01259] Worker 29, sleep for 67.969 sec to decorrelate experience collection [2024-03-29 12:01:22,854][01280] Worker 33, sleep for 77.344 sec to decorrelate experience collection [2024-03-29 12:01:22,886][01854] Worker 18, sleep for 42.188 sec to decorrelate experience collection [2024-03-29 12:01:22,886][00502] Worker 1, sleep for 2.344 sec to decorrelate experience collection [2024-03-29 12:01:24,647][00501] Updated weights for policy 0, policy_version 10 (0.0013) [2024-03-29 12:01:25,232][00887] Worker 2 awakens! [2024-03-29 12:01:25,232][00502] Worker 1 awakens! [2024-03-29 12:01:26,685][00126] Fps is (10 sec: 31130.0, 60 sec: 15565.0, 300 sec: 15565.0). Total num frames: 311296. Throughput: 0: 32818.4. Samples: 656360. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2024-03-29 12:01:26,915][00501] Updated weights for policy 0, policy_version 20 (0.0013) [2024-03-29 12:01:27,490][00503] Worker 3 awakens! [2024-03-29 12:01:29,776][00888] Worker 4 awakens! [2024-03-29 12:01:31,685][00126] Fps is (10 sec: 34406.2, 60 sec: 13762.6, 300 sec: 13762.6). Total num frames: 344064. Throughput: 0: 27087.4. Samples: 677180. Policy #0 lag: (min: 16.0, avg: 19.0, max: 19.0) [2024-03-29 12:01:32,067][00504] Worker 5 awakens! [2024-03-29 12:01:34,571][01003] Worker 6 awakens! [2024-03-29 12:01:36,685][00126] Fps is (10 sec: 8192.0, 60 sec: 13107.3, 300 sec: 13107.3). Total num frames: 393216. Throughput: 0: 24046.8. Samples: 721400. Policy #0 lag: (min: 0.0, avg: 18.7, max: 21.0) [2024-03-29 12:01:36,896][00569] Worker 7 awakens! [2024-03-29 12:01:39,313][01025] Worker 8 awakens! [2024-03-29 12:01:41,552][00505] Worker 9 awakens! [2024-03-29 12:01:41,685][00126] Fps is (10 sec: 8192.0, 60 sec: 12171.0, 300 sec: 12171.0). Total num frames: 425984. Throughput: 0: 21410.4. Samples: 749360. Policy #0 lag: (min: 0.0, avg: 9.5, max: 25.0) [2024-03-29 12:01:41,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:01:41,687][00481] Saving new best policy, reward=0.000! [2024-03-29 12:01:43,924][01023] Worker 10 awakens! [2024-03-29 12:01:44,897][00501] Updated weights for policy 0, policy_version 30 (0.0012) [2024-03-29 12:01:46,418][00589] Worker 11 awakens! [2024-03-29 12:01:46,685][00126] Fps is (10 sec: 11468.6, 60 sec: 12697.6, 300 sec: 12697.6). Total num frames: 507904. Throughput: 0: 20674.5. Samples: 826980. Policy #0 lag: (min: 0.0, avg: 3.7, max: 6.0) [2024-03-29 12:01:46,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:01:48,794][01471] Worker 12 awakens! [2024-03-29 12:01:51,080][00889] Worker 13 awakens! [2024-03-29 12:01:51,685][00126] Fps is (10 sec: 19660.8, 60 sec: 13835.4, 300 sec: 13835.4). Total num frames: 622592. Throughput: 0: 21130.3. Samples: 950860. Policy #0 lag: (min: 0.0, avg: 14.7, max: 37.0) [2024-03-29 12:01:51,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:01:52,866][00501] Updated weights for policy 0, policy_version 40 (0.0015) [2024-03-29 12:01:53,269][01535] Worker 14 awakens! [2024-03-29 12:01:55,731][00890] Worker 15 awakens! [2024-03-29 12:01:56,685][00126] Fps is (10 sec: 22937.7, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 737280. Throughput: 0: 22794.6. Samples: 1025760. Policy #0 lag: (min: 0.0, avg: 17.1, max: 44.0) [2024-03-29 12:01:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:01:58,025][01662] Worker 16 awakens! [2024-03-29 12:01:59,304][00501] Updated weights for policy 0, policy_version 50 (0.0017) [2024-03-29 12:02:00,412][00891] Worker 17 awakens! [2024-03-29 12:02:01,685][00126] Fps is (10 sec: 24576.2, 60 sec: 15788.3, 300 sec: 15788.3). Total num frames: 868352. Throughput: 0: 18732.0. Samples: 1183420. Policy #0 lag: (min: 0.0, avg: 19.8, max: 52.0) [2024-03-29 12:02:01,688][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:05,173][01854] Worker 18 awakens! [2024-03-29 12:02:05,175][00892] Worker 19 awakens! [2024-03-29 12:02:05,244][00501] Updated weights for policy 0, policy_version 60 (0.0018) [2024-03-29 12:02:06,685][00126] Fps is (10 sec: 29491.6, 60 sec: 17203.3, 300 sec: 17203.3). Total num frames: 1032192. Throughput: 0: 15262.2. Samples: 1341140. Policy #0 lag: (min: 2.0, avg: 26.3, max: 60.0) [2024-03-29 12:02:06,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:07,448][01789] Worker 20 awakens! [2024-03-29 12:02:09,698][00996] Worker 21 awakens! [2024-03-29 12:02:11,065][00501] Updated weights for policy 0, policy_version 70 (0.0019) [2024-03-29 12:02:11,685][00126] Fps is (10 sec: 29491.3, 60 sec: 19387.7, 300 sec: 17896.4). Total num frames: 1163264. Throughput: 0: 17453.3. Samples: 1441760. Policy #0 lag: (min: 1.0, avg: 5.2, max: 13.0) [2024-03-29 12:02:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:11,703][00481] Saving new best policy, reward=0.001! [2024-03-29 12:02:11,962][02357] Worker 22 awakens! [2024-03-29 12:02:14,298][00997] Worker 23 awakens! [2024-03-29 12:02:14,983][00501] Updated weights for policy 0, policy_version 80 (0.0015) [2024-03-29 12:02:16,685][00126] Fps is (10 sec: 31129.3, 60 sec: 22391.5, 300 sec: 19192.7). Total num frames: 1343488. Throughput: 0: 21643.5. Samples: 1651140. Policy #0 lag: (min: 1.0, avg: 9.0, max: 15.0) [2024-03-29 12:02:16,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:16,880][02188] Worker 24 awakens! [2024-03-29 12:02:19,056][00501] Updated weights for policy 0, policy_version 90 (0.0019) [2024-03-29 12:02:19,209][01004] Worker 25 awakens! [2024-03-29 12:02:21,433][01920] Worker 26 awakens! [2024-03-29 12:02:21,685][00126] Fps is (10 sec: 40959.2, 60 sec: 26214.3, 300 sec: 20971.5). Total num frames: 1572864. Throughput: 0: 25924.3. Samples: 1888000. Policy #0 lag: (min: 0.0, avg: 35.6, max: 92.0) [2024-03-29 12:02:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:23,618][00501] Updated weights for policy 0, policy_version 100 (0.0015) [2024-03-29 12:02:23,761][01024] Worker 27 awakens! [2024-03-29 12:02:26,160][00481] Signal inference workers to stop experience collection... (50 times) [2024-03-29 12:02:26,206][00501] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-03-29 12:02:26,239][00481] Signal inference workers to resume experience collection... (50 times) [2024-03-29 12:02:26,239][00501] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-03-29 12:02:26,297][02358] Worker 28 awakens! [2024-03-29 12:02:26,685][00126] Fps is (10 sec: 44237.3, 60 sec: 24576.0, 300 sec: 22323.3). Total num frames: 1785856. Throughput: 0: 28437.0. Samples: 2029020. Policy #0 lag: (min: 1.0, avg: 8.4, max: 18.0) [2024-03-29 12:02:26,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:26,848][00501] Updated weights for policy 0, policy_version 110 (0.0020) [2024-03-29 12:02:30,398][01259] Worker 29 awakens! [2024-03-29 12:02:30,798][02372] Worker 30 awakens! [2024-03-29 12:02:31,685][00126] Fps is (10 sec: 37683.4, 60 sec: 26760.5, 300 sec: 22937.6). Total num frames: 1949696. Throughput: 0: 31918.2. Samples: 2263300. Policy #0 lag: (min: 0.0, avg: 7.2, max: 18.0) [2024-03-29 12:02:31,688][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:31,801][00501] Updated weights for policy 0, policy_version 120 (0.0031) [2024-03-29 12:02:33,145][01279] Worker 31 awakens! [2024-03-29 12:02:35,480][02818] Worker 32 awakens! [2024-03-29 12:02:35,482][00501] Updated weights for policy 0, policy_version 130 (0.0025) [2024-03-29 12:02:36,685][00126] Fps is (10 sec: 37682.7, 60 sec: 29491.1, 300 sec: 24029.9). Total num frames: 2162688. Throughput: 0: 33888.9. Samples: 2475860. Policy #0 lag: (min: 0.0, avg: 12.5, max: 21.0) [2024-03-29 12:02:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:40,094][02690] Worker 34 awakens! [2024-03-29 12:02:40,235][01280] Worker 33 awakens! [2024-03-29 12:02:40,243][00501] Updated weights for policy 0, policy_version 140 (0.0021) [2024-03-29 12:02:40,635][00481] self.policy_id=0 batch has 62.50% of invalid samples [2024-03-29 12:02:41,685][00126] Fps is (10 sec: 39322.0, 60 sec: 31948.8, 300 sec: 24662.3). Total num frames: 2342912. Throughput: 0: 35361.4. Samples: 2617020. Policy #0 lag: (min: 0.0, avg: 91.4, max: 140.0) [2024-03-29 12:02:41,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:42,626][01294] Worker 35 awakens! [2024-03-29 12:02:43,555][00501] Updated weights for policy 0, policy_version 150 (0.0020) [2024-03-29 12:02:44,852][02369] Worker 36 awakens! [2024-03-29 12:02:46,685][00126] Fps is (10 sec: 37683.0, 60 sec: 33860.3, 300 sec: 25395.2). Total num frames: 2539520. Throughput: 0: 36487.5. Samples: 2825360. Policy #0 lag: (min: 0.0, avg: 13.9, max: 24.0) [2024-03-29 12:02:46,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:02:47,147][01855] Worker 37 awakens! [2024-03-29 12:02:49,074][00501] Updated weights for policy 0, policy_version 160 (0.0020) [2024-03-29 12:02:49,606][02436] Worker 38 awakens! [2024-03-29 12:02:51,686][00126] Fps is (10 sec: 40959.2, 60 sec: 35498.6, 300 sec: 26214.4). Total num frames: 2752512. Throughput: 0: 38502.9. Samples: 3073780. Policy #0 lag: (min: 0.0, avg: 20.8, max: 164.0) [2024-03-29 12:02:51,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:51,885][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000169_2768896.pth... [2024-03-29 12:02:52,013][01856] Worker 39 awakens! [2024-03-29 12:02:52,556][00501] Updated weights for policy 0, policy_version 170 (0.0019) [2024-03-29 12:02:54,280][02754] Worker 40 awakens! [2024-03-29 12:02:56,513][02053] Worker 41 awakens! [2024-03-29 12:02:56,686][00126] Fps is (10 sec: 39321.5, 60 sec: 36590.9, 300 sec: 26661.2). Total num frames: 2932736. Throughput: 0: 38587.4. Samples: 3178200. Policy #0 lag: (min: 0.0, avg: 15.6, max: 26.0) [2024-03-29 12:02:56,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:02:56,829][00501] Updated weights for policy 0, policy_version 180 (0.0018) [2024-03-29 12:02:58,867][03258] Worker 42 awakens! [2024-03-29 12:03:01,094][00501] Updated weights for policy 0, policy_version 190 (0.0020) [2024-03-29 12:03:01,261][02884] Worker 43 awakens! [2024-03-29 12:03:01,685][00126] Fps is (10 sec: 39322.6, 60 sec: 37956.3, 300 sec: 27354.2). Total num frames: 3145728. Throughput: 0: 39771.6. Samples: 3440860. Policy #0 lag: (min: 0.0, avg: 13.4, max: 28.0) [2024-03-29 12:03:01,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:02,764][00481] Signal inference workers to stop experience collection... (100 times) [2024-03-29 12:03:02,781][00501] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-03-29 12:03:02,972][00481] Signal inference workers to resume experience collection... (100 times) [2024-03-29 12:03:02,972][00501] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-03-29 12:03:03,793][00501] Updated weights for policy 0, policy_version 200 (0.0022) [2024-03-29 12:03:05,477][03713] Worker 44 awakens! [2024-03-29 12:03:06,001][01790] Worker 45 awakens! [2024-03-29 12:03:06,685][00126] Fps is (10 sec: 44237.2, 60 sec: 39048.4, 300 sec: 28125.9). Total num frames: 3375104. Throughput: 0: 39729.4. Samples: 3675820. Policy #0 lag: (min: 1.0, avg: 28.5, max: 204.0) [2024-03-29 12:03:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:08,354][02239] Worker 46 awakens! [2024-03-29 12:03:09,617][00501] Updated weights for policy 0, policy_version 210 (0.0018) [2024-03-29 12:03:10,572][02371] Worker 47 awakens! [2024-03-29 12:03:11,685][00126] Fps is (10 sec: 40960.0, 60 sec: 39867.7, 300 sec: 28442.7). Total num frames: 3555328. Throughput: 0: 40158.7. Samples: 3836160. Policy #0 lag: (min: 0.0, avg: 13.6, max: 30.0) [2024-03-29 12:03:11,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:12,851][00501] Updated weights for policy 0, policy_version 220 (0.0020) [2024-03-29 12:03:12,930][02883] Worker 48 awakens! [2024-03-29 12:03:15,348][02356] Worker 49 awakens! [2024-03-29 12:03:16,357][00501] Updated weights for policy 0, policy_version 230 (0.0020) [2024-03-29 12:03:16,686][00126] Fps is (10 sec: 40959.6, 60 sec: 40686.8, 300 sec: 29113.1). Total num frames: 3784704. Throughput: 0: 39475.5. Samples: 4039700. Policy #0 lag: (min: 0.0, avg: 82.6, max: 229.0) [2024-03-29 12:03:16,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:17,548][03445] Worker 50 awakens! [2024-03-29 12:03:19,933][02370] Worker 51 awakens! [2024-03-29 12:03:21,438][00501] Updated weights for policy 0, policy_version 240 (0.0018) [2024-03-29 12:03:21,685][00126] Fps is (10 sec: 37682.9, 60 sec: 39321.7, 300 sec: 29127.1). Total num frames: 3932160. Throughput: 0: 41199.6. Samples: 4329840. Policy #0 lag: (min: 0.0, avg: 58.2, max: 237.0) [2024-03-29 12:03:21,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:22,443][03373] Worker 52 awakens! [2024-03-29 12:03:24,681][00501] Updated weights for policy 0, policy_version 250 (0.0018) [2024-03-29 12:03:24,780][02111] Worker 53 awakens! [2024-03-29 12:03:26,685][00126] Fps is (10 sec: 40960.8, 60 sec: 40140.8, 300 sec: 29959.4). Total num frames: 4194304. Throughput: 0: 40756.0. Samples: 4451040. Policy #0 lag: (min: 2.0, avg: 15.7, max: 33.0) [2024-03-29 12:03:26,688][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:27,091][03070] Worker 54 awakens! [2024-03-29 12:03:27,497][00501] Updated weights for policy 0, policy_version 260 (0.0017) [2024-03-29 12:03:29,521][02500] Worker 55 awakens! [2024-03-29 12:03:31,506][00501] Updated weights for policy 0, policy_version 270 (0.0020) [2024-03-29 12:03:31,633][03586] Worker 56 awakens! [2024-03-29 12:03:31,685][00126] Fps is (10 sec: 49151.7, 60 sec: 41233.1, 300 sec: 30508.2). Total num frames: 4423680. Throughput: 0: 41119.6. Samples: 4675740. Policy #0 lag: (min: 1.0, avg: 19.3, max: 35.0) [2024-03-29 12:03:31,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:34,225][02819] Worker 57 awakens! [2024-03-29 12:03:35,069][00481] Signal inference workers to stop experience collection... (150 times) [2024-03-29 12:03:35,101][00501] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-03-29 12:03:35,275][00481] Signal inference workers to resume experience collection... (150 times) [2024-03-29 12:03:35,276][00501] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-03-29 12:03:36,469][00501] Updated weights for policy 0, policy_version 280 (0.0021) [2024-03-29 12:03:36,553][03585] Worker 58 awakens! [2024-03-29 12:03:36,685][00126] Fps is (10 sec: 39321.5, 60 sec: 40413.9, 300 sec: 30583.5). Total num frames: 4587520. Throughput: 0: 41955.3. Samples: 4961760. Policy #0 lag: (min: 2.0, avg: 15.5, max: 36.0) [2024-03-29 12:03:36,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:38,734][03302] Worker 59 awakens! [2024-03-29 12:03:39,459][00501] Updated weights for policy 0, policy_version 290 (0.0021) [2024-03-29 12:03:41,047][03725] Worker 60 awakens! [2024-03-29 12:03:41,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 31393.9). Total num frames: 4866048. Throughput: 0: 42003.1. Samples: 5068340. Policy #0 lag: (min: 1.0, avg: 21.1, max: 38.0) [2024-03-29 12:03:41,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:43,143][00501] Updated weights for policy 0, policy_version 300 (0.0027) [2024-03-29 12:03:43,497][03572] Worker 61 awakens! [2024-03-29 12:03:45,920][03841] Worker 62 awakens! [2024-03-29 12:03:46,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 31539.2). Total num frames: 5046272. Throughput: 0: 41877.7. Samples: 5325360. Policy #0 lag: (min: 0.0, avg: 22.5, max: 38.0) [2024-03-29 12:03:46,687][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:03:48,353][00501] Updated weights for policy 0, policy_version 310 (0.0021) [2024-03-29 12:03:50,041][03578] Worker 63 awakens! [2024-03-29 12:03:51,531][00501] Updated weights for policy 0, policy_version 320 (0.0024) [2024-03-29 12:03:51,685][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 31775.1). Total num frames: 5242880. Throughput: 0: 42967.1. Samples: 5609340. Policy #0 lag: (min: 1.0, avg: 17.8, max: 40.0) [2024-03-29 12:03:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:03:54,680][00501] Updated weights for policy 0, policy_version 330 (0.0025) [2024-03-29 12:03:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42871.5, 300 sec: 32382.5). Total num frames: 5505024. Throughput: 0: 41939.0. Samples: 5723420. Policy #0 lag: (min: 2.0, avg: 22.7, max: 41.0) [2024-03-29 12:03:56,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:03:58,506][00501] Updated weights for policy 0, policy_version 340 (0.0016) [2024-03-29 12:04:00,939][00481] Signal inference workers to stop experience collection... (200 times) [2024-03-29 12:04:01,016][00501] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-03-29 12:04:01,021][00481] Signal inference workers to resume experience collection... (200 times) [2024-03-29 12:04:01,044][00501] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-03-29 12:04:01,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.3, 300 sec: 32580.8). Total num frames: 5701632. Throughput: 0: 42674.8. Samples: 5960060. Policy #0 lag: (min: 0.0, avg: 23.6, max: 41.0) [2024-03-29 12:04:01,688][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:03,782][00501] Updated weights for policy 0, policy_version 350 (0.0018) [2024-03-29 12:04:06,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.2, 300 sec: 32586.0). Total num frames: 5865472. Throughput: 0: 42509.8. Samples: 6242780. Policy #0 lag: (min: 1.0, avg: 18.7, max: 44.0) [2024-03-29 12:04:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:07,170][00501] Updated weights for policy 0, policy_version 360 (0.0022) [2024-03-29 12:04:10,268][00501] Updated weights for policy 0, policy_version 370 (0.0023) [2024-03-29 12:04:11,685][00126] Fps is (10 sec: 44236.7, 60 sec: 43144.4, 300 sec: 33210.8). Total num frames: 6144000. Throughput: 0: 42363.9. Samples: 6357420. Policy #0 lag: (min: 2.0, avg: 23.6, max: 42.0) [2024-03-29 12:04:11,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:14,169][00501] Updated weights for policy 0, policy_version 380 (0.0025) [2024-03-29 12:04:16,685][00126] Fps is (10 sec: 47513.4, 60 sec: 42598.5, 300 sec: 33371.6). Total num frames: 6340608. Throughput: 0: 42478.3. Samples: 6587260. Policy #0 lag: (min: 0.0, avg: 23.8, max: 43.0) [2024-03-29 12:04:16,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:19,190][00501] Updated weights for policy 0, policy_version 390 (0.0017) [2024-03-29 12:04:21,685][00126] Fps is (10 sec: 34406.7, 60 sec: 42598.4, 300 sec: 33272.2). Total num frames: 6488064. Throughput: 0: 42702.7. Samples: 6883380. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 12:04:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:22,681][00501] Updated weights for policy 0, policy_version 400 (0.0021) [2024-03-29 12:04:25,881][00501] Updated weights for policy 0, policy_version 410 (0.0022) [2024-03-29 12:04:26,685][00126] Fps is (10 sec: 42599.0, 60 sec: 42871.5, 300 sec: 33833.0). Total num frames: 6766592. Throughput: 0: 42785.1. Samples: 6993660. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 12:04:26,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:29,620][00501] Updated weights for policy 0, policy_version 420 (0.0022) [2024-03-29 12:04:30,687][00481] Signal inference workers to stop experience collection... (250 times) [2024-03-29 12:04:30,767][00481] Signal inference workers to resume experience collection... (250 times) [2024-03-29 12:04:30,770][00501] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-03-29 12:04:30,792][00501] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-03-29 12:04:31,685][00126] Fps is (10 sec: 49151.6, 60 sec: 42598.4, 300 sec: 34046.8). Total num frames: 6979584. Throughput: 0: 42120.0. Samples: 7220760. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:04:31,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:04:34,822][00501] Updated weights for policy 0, policy_version 430 (0.0029) [2024-03-29 12:04:36,685][00126] Fps is (10 sec: 34406.3, 60 sec: 42052.3, 300 sec: 33860.3). Total num frames: 7110656. Throughput: 0: 42457.0. Samples: 7519900. Policy #0 lag: (min: 0.0, avg: 17.9, max: 41.0) [2024-03-29 12:04:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:38,126][00501] Updated weights for policy 0, policy_version 440 (0.0029) [2024-03-29 12:04:41,458][00501] Updated weights for policy 0, policy_version 450 (0.0033) [2024-03-29 12:04:41,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 34292.1). Total num frames: 7372800. Throughput: 0: 42334.7. Samples: 7628480. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:04:41,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:04:45,099][00501] Updated weights for policy 0, policy_version 460 (0.0018) [2024-03-29 12:04:46,685][00126] Fps is (10 sec: 50790.2, 60 sec: 42871.5, 300 sec: 34629.8). Total num frames: 7618560. Throughput: 0: 42247.2. Samples: 7861180. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 12:04:46,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:50,301][00501] Updated weights for policy 0, policy_version 470 (0.0023) [2024-03-29 12:04:51,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 34442.8). Total num frames: 7749632. Throughput: 0: 42456.0. Samples: 8153300. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 12:04:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:51,705][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000473_7749632.pth... [2024-03-29 12:04:53,815][00501] Updated weights for policy 0, policy_version 480 (0.0020) [2024-03-29 12:04:56,685][00126] Fps is (10 sec: 37683.0, 60 sec: 41506.2, 300 sec: 34762.6). Total num frames: 7995392. Throughput: 0: 42196.0. Samples: 8256240. Policy #0 lag: (min: 1.0, avg: 18.5, max: 41.0) [2024-03-29 12:04:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:04:57,134][00501] Updated weights for policy 0, policy_version 490 (0.0024) [2024-03-29 12:05:00,776][00501] Updated weights for policy 0, policy_version 500 (0.0023) [2024-03-29 12:05:01,686][00126] Fps is (10 sec: 49151.1, 60 sec: 42325.2, 300 sec: 35068.7). Total num frames: 8241152. Throughput: 0: 42280.7. Samples: 8489900. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 12:05:01,687][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:05:05,834][00501] Updated weights for policy 0, policy_version 510 (0.0018) [2024-03-29 12:05:06,685][00126] Fps is (10 sec: 37683.0, 60 sec: 41779.1, 300 sec: 34884.3). Total num frames: 8372224. Throughput: 0: 41964.3. Samples: 8771780. Policy #0 lag: (min: 0.0, avg: 17.5, max: 41.0) [2024-03-29 12:05:06,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:05:06,898][00481] Signal inference workers to stop experience collection... (300 times) [2024-03-29 12:05:06,898][00481] Signal inference workers to resume experience collection... (300 times) [2024-03-29 12:05:06,921][00501] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-03-29 12:05:06,922][00501] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-03-29 12:05:09,395][00501] Updated weights for policy 0, policy_version 520 (0.0025) [2024-03-29 12:05:11,685][00126] Fps is (10 sec: 37684.1, 60 sec: 41233.1, 300 sec: 35175.5). Total num frames: 8617984. Throughput: 0: 42168.8. Samples: 8891260. Policy #0 lag: (min: 1.0, avg: 18.8, max: 45.0) [2024-03-29 12:05:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:12,702][00501] Updated weights for policy 0, policy_version 530 (0.0020) [2024-03-29 12:05:16,174][00501] Updated weights for policy 0, policy_version 540 (0.0024) [2024-03-29 12:05:16,685][00126] Fps is (10 sec: 49152.6, 60 sec: 42052.3, 300 sec: 35455.0). Total num frames: 8863744. Throughput: 0: 42168.1. Samples: 9118320. Policy #0 lag: (min: 0.0, avg: 24.3, max: 44.0) [2024-03-29 12:05:16,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:05:17,025][00481] Saving new best policy, reward=0.002! [2024-03-29 12:05:21,192][00501] Updated weights for policy 0, policy_version 550 (0.0025) [2024-03-29 12:05:21,685][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 35338.1). Total num frames: 9011200. Throughput: 0: 41825.7. Samples: 9402060. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 12:05:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:24,983][00501] Updated weights for policy 0, policy_version 560 (0.0029) [2024-03-29 12:05:26,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41233.0, 300 sec: 35540.7). Total num frames: 9240576. Throughput: 0: 42178.7. Samples: 9526520. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 12:05:26,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:28,200][00501] Updated weights for policy 0, policy_version 570 (0.0019) [2024-03-29 12:05:31,518][00501] Updated weights for policy 0, policy_version 580 (0.0017) [2024-03-29 12:05:31,685][00126] Fps is (10 sec: 49151.8, 60 sec: 42052.2, 300 sec: 35859.3). Total num frames: 9502720. Throughput: 0: 42003.5. Samples: 9751340. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 12:05:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:36,602][00501] Updated weights for policy 0, policy_version 590 (0.0021) [2024-03-29 12:05:36,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 35802.1). Total num frames: 9666560. Throughput: 0: 42000.1. Samples: 10043300. Policy #0 lag: (min: 0.0, avg: 24.2, max: 42.0) [2024-03-29 12:05:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:38,779][00481] Signal inference workers to stop experience collection... (350 times) [2024-03-29 12:05:38,824][00501] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-03-29 12:05:38,997][00481] Signal inference workers to resume experience collection... (350 times) [2024-03-29 12:05:38,998][00501] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-03-29 12:05:40,342][00501] Updated weights for policy 0, policy_version 600 (0.0037) [2024-03-29 12:05:41,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 35925.7). Total num frames: 9879552. Throughput: 0: 42539.6. Samples: 10170520. Policy #0 lag: (min: 1.0, avg: 19.1, max: 45.0) [2024-03-29 12:05:41,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:43,518][00501] Updated weights for policy 0, policy_version 610 (0.0027) [2024-03-29 12:05:46,685][00126] Fps is (10 sec: 47513.3, 60 sec: 42052.3, 300 sec: 36220.4). Total num frames: 10141696. Throughput: 0: 42202.0. Samples: 10388980. Policy #0 lag: (min: 1.0, avg: 23.8, max: 43.0) [2024-03-29 12:05:46,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:46,880][00501] Updated weights for policy 0, policy_version 620 (0.0017) [2024-03-29 12:05:51,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 36159.8). Total num frames: 10305536. Throughput: 0: 42158.2. Samples: 10668900. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 12:05:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:52,112][00501] Updated weights for policy 0, policy_version 630 (0.0026) [2024-03-29 12:05:55,898][00501] Updated weights for policy 0, policy_version 640 (0.0026) [2024-03-29 12:05:56,685][00126] Fps is (10 sec: 36044.5, 60 sec: 41779.2, 300 sec: 36214.3). Total num frames: 10502144. Throughput: 0: 42739.5. Samples: 10814540. Policy #0 lag: (min: 0.0, avg: 17.0, max: 42.0) [2024-03-29 12:05:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:05:59,181][00501] Updated weights for policy 0, policy_version 650 (0.0021) [2024-03-29 12:06:01,685][00126] Fps is (10 sec: 45875.4, 60 sec: 42052.4, 300 sec: 36489.1). Total num frames: 10764288. Throughput: 0: 42245.2. Samples: 11019360. Policy #0 lag: (min: 0.0, avg: 22.8, max: 44.0) [2024-03-29 12:06:01,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:02,449][00501] Updated weights for policy 0, policy_version 660 (0.0023) [2024-03-29 12:06:06,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42871.5, 300 sec: 37100.0). Total num frames: 10944512. Throughput: 0: 41879.1. Samples: 11286620. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 12:06:06,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:06:07,521][00501] Updated weights for policy 0, policy_version 670 (0.0018) [2024-03-29 12:06:08,748][00481] Signal inference workers to stop experience collection... (400 times) [2024-03-29 12:06:08,787][00501] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-03-29 12:06:08,964][00481] Signal inference workers to resume experience collection... (400 times) [2024-03-29 12:06:08,964][00501] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-03-29 12:06:11,685][00126] Fps is (10 sec: 36045.0, 60 sec: 41779.2, 300 sec: 37711.0). Total num frames: 11124736. Throughput: 0: 42420.4. Samples: 11435440. Policy #0 lag: (min: 0.0, avg: 16.5, max: 41.0) [2024-03-29 12:06:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:11,719][00501] Updated weights for policy 0, policy_version 680 (0.0029) [2024-03-29 12:06:14,567][00501] Updated weights for policy 0, policy_version 690 (0.0023) [2024-03-29 12:06:16,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 38655.1). Total num frames: 11403264. Throughput: 0: 42454.3. Samples: 11661780. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 12:06:16,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:06:17,837][00501] Updated weights for policy 0, policy_version 700 (0.0020) [2024-03-29 12:06:21,685][00126] Fps is (10 sec: 47513.5, 60 sec: 43144.5, 300 sec: 38266.3). Total num frames: 11599872. Throughput: 0: 41607.9. Samples: 11915660. Policy #0 lag: (min: 0.0, avg: 24.2, max: 42.0) [2024-03-29 12:06:21,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:22,754][00501] Updated weights for policy 0, policy_version 710 (0.0027) [2024-03-29 12:06:26,686][00126] Fps is (10 sec: 36044.5, 60 sec: 42052.2, 300 sec: 38710.7). Total num frames: 11763712. Throughput: 0: 42335.4. Samples: 12075620. Policy #0 lag: (min: 0.0, avg: 16.7, max: 43.0) [2024-03-29 12:06:26,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:27,087][00501] Updated weights for policy 0, policy_version 720 (0.0023) [2024-03-29 12:06:30,005][00501] Updated weights for policy 0, policy_version 730 (0.0030) [2024-03-29 12:06:31,685][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 39488.2). Total num frames: 12042240. Throughput: 0: 42464.0. Samples: 12299860. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 12:06:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:32,694][00481] Signal inference workers to stop experience collection... (450 times) [2024-03-29 12:06:32,731][00501] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-03-29 12:06:32,918][00481] Signal inference workers to resume experience collection... (450 times) [2024-03-29 12:06:32,918][00501] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-03-29 12:06:33,217][00501] Updated weights for policy 0, policy_version 740 (0.0030) [2024-03-29 12:06:36,685][00126] Fps is (10 sec: 49152.2, 60 sec: 43144.4, 300 sec: 40099.1). Total num frames: 12255232. Throughput: 0: 41403.6. Samples: 12532060. Policy #0 lag: (min: 1.0, avg: 24.9, max: 42.0) [2024-03-29 12:06:36,687][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:38,221][00501] Updated weights for policy 0, policy_version 750 (0.0020) [2024-03-29 12:06:41,685][00126] Fps is (10 sec: 34406.3, 60 sec: 41779.2, 300 sec: 40265.8). Total num frames: 12386304. Throughput: 0: 41894.3. Samples: 12699780. Policy #0 lag: (min: 0.0, avg: 16.8, max: 40.0) [2024-03-29 12:06:41,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:42,660][00501] Updated weights for policy 0, policy_version 760 (0.0026) [2024-03-29 12:06:45,709][00501] Updated weights for policy 0, policy_version 770 (0.0018) [2024-03-29 12:06:46,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 40765.6). Total num frames: 12648448. Throughput: 0: 42450.2. Samples: 12929620. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 12:06:46,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:06:48,808][00501] Updated weights for policy 0, policy_version 780 (0.0022) [2024-03-29 12:06:51,686][00126] Fps is (10 sec: 50789.5, 60 sec: 43144.5, 300 sec: 41209.9). Total num frames: 12894208. Throughput: 0: 41664.3. Samples: 13161520. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 12:06:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:06:51,875][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000788_12910592.pth... [2024-03-29 12:06:52,279][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000169_2768896.pth [2024-03-29 12:06:53,875][00501] Updated weights for policy 0, policy_version 790 (0.0024) [2024-03-29 12:06:56,685][00126] Fps is (10 sec: 34406.6, 60 sec: 41506.2, 300 sec: 41098.8). Total num frames: 12992512. Throughput: 0: 41709.8. Samples: 13312380. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:06:56,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:06:58,688][00501] Updated weights for policy 0, policy_version 800 (0.0022) [2024-03-29 12:07:01,597][00501] Updated weights for policy 0, policy_version 810 (0.0028) [2024-03-29 12:07:01,685][00126] Fps is (10 sec: 37683.8, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 13271040. Throughput: 0: 42359.1. Samples: 13567940. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:07:01,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:02,469][00481] Signal inference workers to stop experience collection... (500 times) [2024-03-29 12:07:02,557][00501] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-03-29 12:07:02,635][00481] Signal inference workers to resume experience collection... (500 times) [2024-03-29 12:07:02,635][00501] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-03-29 12:07:04,563][00501] Updated weights for policy 0, policy_version 820 (0.0035) [2024-03-29 12:07:06,685][00126] Fps is (10 sec: 50790.4, 60 sec: 42598.4, 300 sec: 41820.8). Total num frames: 13500416. Throughput: 0: 41679.1. Samples: 13791220. Policy #0 lag: (min: 0.0, avg: 24.7, max: 41.0) [2024-03-29 12:07:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:09,570][00501] Updated weights for policy 0, policy_version 830 (0.0028) [2024-03-29 12:07:11,685][00126] Fps is (10 sec: 37683.1, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 13647872. Throughput: 0: 41277.8. Samples: 13933120. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 12:07:11,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:14,320][00501] Updated weights for policy 0, policy_version 840 (0.0022) [2024-03-29 12:07:16,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 13877248. Throughput: 0: 42169.4. Samples: 14197480. Policy #0 lag: (min: 2.0, avg: 19.4, max: 43.0) [2024-03-29 12:07:16,688][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:07:17,307][00501] Updated weights for policy 0, policy_version 850 (0.0018) [2024-03-29 12:07:20,300][00501] Updated weights for policy 0, policy_version 860 (0.0023) [2024-03-29 12:07:21,685][00126] Fps is (10 sec: 49152.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 14139392. Throughput: 0: 41894.7. Samples: 14417320. Policy #0 lag: (min: 0.0, avg: 24.5, max: 41.0) [2024-03-29 12:07:21,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:25,045][00501] Updated weights for policy 0, policy_version 870 (0.0027) [2024-03-29 12:07:26,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 14270464. Throughput: 0: 41276.5. Samples: 14557220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 12:07:26,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:07:29,932][00501] Updated weights for policy 0, policy_version 880 (0.0022) [2024-03-29 12:07:31,686][00126] Fps is (10 sec: 36044.4, 60 sec: 40959.9, 300 sec: 41820.8). Total num frames: 14499840. Throughput: 0: 42091.0. Samples: 14823720. Policy #0 lag: (min: 0.0, avg: 18.6, max: 43.0) [2024-03-29 12:07:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:33,111][00501] Updated weights for policy 0, policy_version 890 (0.0019) [2024-03-29 12:07:34,221][00481] Signal inference workers to stop experience collection... (550 times) [2024-03-29 12:07:34,281][00501] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-03-29 12:07:34,313][00481] Signal inference workers to resume experience collection... (550 times) [2024-03-29 12:07:34,318][00501] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-03-29 12:07:35,924][00501] Updated weights for policy 0, policy_version 900 (0.0027) [2024-03-29 12:07:36,685][00126] Fps is (10 sec: 49151.6, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 14761984. Throughput: 0: 41731.2. Samples: 15039420. Policy #0 lag: (min: 2.0, avg: 21.9, max: 42.0) [2024-03-29 12:07:36,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:07:40,809][00501] Updated weights for policy 0, policy_version 910 (0.0027) [2024-03-29 12:07:41,685][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 14909440. Throughput: 0: 41303.0. Samples: 15171020. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 12:07:41,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:07:45,507][00501] Updated weights for policy 0, policy_version 920 (0.0018) [2024-03-29 12:07:46,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 15122432. Throughput: 0: 41982.6. Samples: 15457160. Policy #0 lag: (min: 0.0, avg: 18.5, max: 43.0) [2024-03-29 12:07:46,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:07:48,597][00501] Updated weights for policy 0, policy_version 930 (0.0022) [2024-03-29 12:07:51,465][00501] Updated weights for policy 0, policy_version 940 (0.0021) [2024-03-29 12:07:51,685][00126] Fps is (10 sec: 49152.5, 60 sec: 41779.4, 300 sec: 42265.2). Total num frames: 15400960. Throughput: 0: 41796.0. Samples: 15672040. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:07:51,686][00126] Avg episode reward: [(0, '0.000')] [2024-03-29 12:07:56,417][00501] Updated weights for policy 0, policy_version 950 (0.0028) [2024-03-29 12:07:56,686][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 15564800. Throughput: 0: 41438.6. Samples: 15797860. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 12:07:56,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:08:01,224][00501] Updated weights for policy 0, policy_version 960 (0.0019) [2024-03-29 12:08:01,685][00126] Fps is (10 sec: 34406.1, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 15745024. Throughput: 0: 42014.1. Samples: 16088120. Policy #0 lag: (min: 0.0, avg: 16.8, max: 41.0) [2024-03-29 12:08:01,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:08:04,100][00501] Updated weights for policy 0, policy_version 970 (0.0022) [2024-03-29 12:08:04,565][00481] Signal inference workers to stop experience collection... (600 times) [2024-03-29 12:08:04,588][00501] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-03-29 12:08:04,768][00481] Signal inference workers to resume experience collection... (600 times) [2024-03-29 12:08:04,769][00501] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-03-29 12:08:06,685][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 16023552. Throughput: 0: 41968.0. Samples: 16305880. Policy #0 lag: (min: 2.0, avg: 21.8, max: 42.0) [2024-03-29 12:08:06,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:07,031][00501] Updated weights for policy 0, policy_version 980 (0.0030) [2024-03-29 12:08:11,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 16187392. Throughput: 0: 41706.3. Samples: 16434000. Policy #0 lag: (min: 0.0, avg: 24.7, max: 44.0) [2024-03-29 12:08:11,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:08:12,020][00501] Updated weights for policy 0, policy_version 990 (0.0020) [2024-03-29 12:08:16,685][00126] Fps is (10 sec: 34406.5, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 16367616. Throughput: 0: 42193.9. Samples: 16722440. Policy #0 lag: (min: 0.0, avg: 16.9, max: 41.0) [2024-03-29 12:08:16,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:08:16,768][00501] Updated weights for policy 0, policy_version 1000 (0.0021) [2024-03-29 12:08:19,787][00501] Updated weights for policy 0, policy_version 1010 (0.0022) [2024-03-29 12:08:21,685][00126] Fps is (10 sec: 45874.8, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 16646144. Throughput: 0: 42239.2. Samples: 16940180. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:08:21,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:08:22,625][00501] Updated weights for policy 0, policy_version 1020 (0.0024) [2024-03-29 12:08:26,685][00126] Fps is (10 sec: 47513.4, 60 sec: 42871.4, 300 sec: 42098.6). Total num frames: 16842752. Throughput: 0: 42173.0. Samples: 17068800. Policy #0 lag: (min: 0.0, avg: 24.8, max: 44.0) [2024-03-29 12:08:26,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:08:26,686][00481] Saving new best policy, reward=0.004! [2024-03-29 12:08:27,664][00501] Updated weights for policy 0, policy_version 1030 (0.0023) [2024-03-29 12:08:31,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 17006592. Throughput: 0: 42012.5. Samples: 17347720. Policy #0 lag: (min: 0.0, avg: 17.4, max: 41.0) [2024-03-29 12:08:31,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:32,323][00501] Updated weights for policy 0, policy_version 1040 (0.0021) [2024-03-29 12:08:35,526][00501] Updated weights for policy 0, policy_version 1050 (0.0022) [2024-03-29 12:08:36,685][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 17268736. Throughput: 0: 42253.7. Samples: 17573460. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 12:08:36,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:08:37,250][00481] Signal inference workers to stop experience collection... (650 times) [2024-03-29 12:08:37,286][00501] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-03-29 12:08:37,464][00481] Signal inference workers to resume experience collection... (650 times) [2024-03-29 12:08:37,464][00501] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-03-29 12:08:38,298][00501] Updated weights for policy 0, policy_version 1060 (0.0018) [2024-03-29 12:08:41,685][00126] Fps is (10 sec: 49151.9, 60 sec: 43144.6, 300 sec: 42209.6). Total num frames: 17498112. Throughput: 0: 42057.9. Samples: 17690460. Policy #0 lag: (min: 0.0, avg: 24.8, max: 43.0) [2024-03-29 12:08:41,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:43,350][00501] Updated weights for policy 0, policy_version 1070 (0.0017) [2024-03-29 12:08:46,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 17629184. Throughput: 0: 41967.6. Samples: 17976660. Policy #0 lag: (min: 1.0, avg: 17.4, max: 41.0) [2024-03-29 12:08:46,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:08:47,869][00501] Updated weights for policy 0, policy_version 1080 (0.0019) [2024-03-29 12:08:51,090][00501] Updated weights for policy 0, policy_version 1090 (0.0026) [2024-03-29 12:08:51,685][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 17891328. Throughput: 0: 42289.0. Samples: 18208880. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 12:08:51,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:51,947][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001093_17907712.pth... [2024-03-29 12:08:52,312][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000473_7749632.pth [2024-03-29 12:08:54,268][00501] Updated weights for policy 0, policy_version 1100 (0.0025) [2024-03-29 12:08:56,686][00126] Fps is (10 sec: 49151.8, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 18120704. Throughput: 0: 41824.3. Samples: 18316100. Policy #0 lag: (min: 1.0, avg: 24.5, max: 42.0) [2024-03-29 12:08:56,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:08:59,171][00501] Updated weights for policy 0, policy_version 1110 (0.0022) [2024-03-29 12:09:01,685][00126] Fps is (10 sec: 36044.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 18251776. Throughput: 0: 41635.1. Samples: 18596020. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 12:09:01,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:03,690][00501] Updated weights for policy 0, policy_version 1120 (0.0026) [2024-03-29 12:09:06,686][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 18497536. Throughput: 0: 42195.9. Samples: 18839000. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 12:09:06,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:09:06,864][00501] Updated weights for policy 0, policy_version 1130 (0.0025) [2024-03-29 12:09:07,683][00481] Signal inference workers to stop experience collection... (700 times) [2024-03-29 12:09:07,726][00501] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-03-29 12:09:07,767][00481] Signal inference workers to resume experience collection... (700 times) [2024-03-29 12:09:07,769][00501] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-03-29 12:09:09,830][00501] Updated weights for policy 0, policy_version 1140 (0.0023) [2024-03-29 12:09:11,685][00126] Fps is (10 sec: 50789.9, 60 sec: 42871.3, 300 sec: 42098.5). Total num frames: 18759680. Throughput: 0: 41765.2. Samples: 18948240. Policy #0 lag: (min: 0.0, avg: 23.8, max: 42.0) [2024-03-29 12:09:11,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:09:14,483][00501] Updated weights for policy 0, policy_version 1150 (0.0024) [2024-03-29 12:09:16,685][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 18907136. Throughput: 0: 41907.1. Samples: 19233540. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:09:16,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:19,018][00501] Updated weights for policy 0, policy_version 1160 (0.0023) [2024-03-29 12:09:21,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 19120128. Throughput: 0: 42290.3. Samples: 19476520. Policy #0 lag: (min: 2.0, avg: 18.9, max: 42.0) [2024-03-29 12:09:21,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:22,425][00501] Updated weights for policy 0, policy_version 1170 (0.0026) [2024-03-29 12:09:25,342][00501] Updated weights for policy 0, policy_version 1180 (0.0023) [2024-03-29 12:09:26,685][00126] Fps is (10 sec: 47513.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 19382272. Throughput: 0: 42243.9. Samples: 19591440. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 12:09:26,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:09:29,897][00501] Updated weights for policy 0, policy_version 1190 (0.0028) [2024-03-29 12:09:31,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 19529728. Throughput: 0: 41756.5. Samples: 19855700. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 12:09:31,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:34,579][00501] Updated weights for policy 0, policy_version 1200 (0.0027) [2024-03-29 12:09:36,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 19759104. Throughput: 0: 42155.9. Samples: 20105900. Policy #0 lag: (min: 1.0, avg: 20.0, max: 43.0) [2024-03-29 12:09:36,686][00126] Avg episode reward: [(0, '0.001')] [2024-03-29 12:09:37,194][00481] Signal inference workers to stop experience collection... (750 times) [2024-03-29 12:09:37,229][00501] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-03-29 12:09:37,405][00481] Signal inference workers to resume experience collection... (750 times) [2024-03-29 12:09:37,406][00501] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-03-29 12:09:37,951][00501] Updated weights for policy 0, policy_version 1210 (0.0023) [2024-03-29 12:09:40,964][00501] Updated weights for policy 0, policy_version 1220 (0.0025) [2024-03-29 12:09:41,686][00126] Fps is (10 sec: 47513.0, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 20004864. Throughput: 0: 42221.3. Samples: 20216060. Policy #0 lag: (min: 1.0, avg: 22.3, max: 42.0) [2024-03-29 12:09:41,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:09:45,711][00501] Updated weights for policy 0, policy_version 1230 (0.0018) [2024-03-29 12:09:46,685][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 20152320. Throughput: 0: 41661.3. Samples: 20470780. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 12:09:46,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:09:50,274][00501] Updated weights for policy 0, policy_version 1240 (0.0028) [2024-03-29 12:09:51,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 20381696. Throughput: 0: 42074.8. Samples: 20732360. Policy #0 lag: (min: 2.0, avg: 17.0, max: 42.0) [2024-03-29 12:09:51,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:09:53,482][00501] Updated weights for policy 0, policy_version 1250 (0.0021) [2024-03-29 12:09:56,685][00126] Fps is (10 sec: 47513.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 20627456. Throughput: 0: 42277.9. Samples: 20850740. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 12:09:56,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:09:56,718][00501] Updated weights for policy 0, policy_version 1260 (0.0021) [2024-03-29 12:10:01,133][00501] Updated weights for policy 0, policy_version 1270 (0.0018) [2024-03-29 12:10:01,685][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 20807680. Throughput: 0: 41619.5. Samples: 21106420. Policy #0 lag: (min: 1.0, avg: 23.9, max: 41.0) [2024-03-29 12:10:01,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:05,674][00501] Updated weights for policy 0, policy_version 1280 (0.0019) [2024-03-29 12:10:06,685][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 21020672. Throughput: 0: 42099.6. Samples: 21371000. Policy #0 lag: (min: 2.0, avg: 18.0, max: 43.0) [2024-03-29 12:10:06,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:08,992][00501] Updated weights for policy 0, policy_version 1290 (0.0029) [2024-03-29 12:10:10,359][00481] Signal inference workers to stop experience collection... (800 times) [2024-03-29 12:10:10,441][00481] Signal inference workers to resume experience collection... (800 times) [2024-03-29 12:10:10,447][00501] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-03-29 12:10:10,466][00501] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-03-29 12:10:11,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 21266432. Throughput: 0: 42121.4. Samples: 21486900. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 12:10:11,688][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:10:12,191][00501] Updated weights for policy 0, policy_version 1300 (0.0017) [2024-03-29 12:10:16,571][00501] Updated weights for policy 0, policy_version 1310 (0.0025) [2024-03-29 12:10:16,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 21463040. Throughput: 0: 41902.7. Samples: 21741320. Policy #0 lag: (min: 0.0, avg: 24.6, max: 42.0) [2024-03-29 12:10:16,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:10:21,034][00501] Updated weights for policy 0, policy_version 1320 (0.0029) [2024-03-29 12:10:21,685][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 21643264. Throughput: 0: 42469.3. Samples: 22017020. Policy #0 lag: (min: 2.0, avg: 18.1, max: 43.0) [2024-03-29 12:10:21,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:10:24,511][00501] Updated weights for policy 0, policy_version 1330 (0.0023) [2024-03-29 12:10:26,685][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 21889024. Throughput: 0: 42492.1. Samples: 22128200. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 12:10:26,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:10:27,072][00481] Saving new best policy, reward=0.006! [2024-03-29 12:10:27,957][00501] Updated weights for policy 0, policy_version 1340 (0.0022) [2024-03-29 12:10:31,685][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 22085632. Throughput: 0: 41922.7. Samples: 22357300. Policy #0 lag: (min: 1.0, avg: 23.6, max: 41.0) [2024-03-29 12:10:31,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:10:32,124][00501] Updated weights for policy 0, policy_version 1350 (0.0018) [2024-03-29 12:10:36,649][00501] Updated weights for policy 0, policy_version 1360 (0.0032) [2024-03-29 12:10:36,685][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 22282240. Throughput: 0: 42157.8. Samples: 22629460. Policy #0 lag: (min: 2.0, avg: 18.4, max: 43.0) [2024-03-29 12:10:36,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:10:40,329][00501] Updated weights for policy 0, policy_version 1370 (0.0025) [2024-03-29 12:10:40,339][00481] Signal inference workers to stop experience collection... (850 times) [2024-03-29 12:10:40,340][00481] Signal inference workers to resume experience collection... (850 times) [2024-03-29 12:10:40,375][00501] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-03-29 12:10:40,375][00501] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-03-29 12:10:41,685][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 22511616. Throughput: 0: 42184.5. Samples: 22749040. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 12:10:41,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:43,387][00501] Updated weights for policy 0, policy_version 1380 (0.0019) [2024-03-29 12:10:46,685][00126] Fps is (10 sec: 45874.9, 60 sec: 43144.5, 300 sec: 42154.1). Total num frames: 22740992. Throughput: 0: 41990.6. Samples: 22996000. Policy #0 lag: (min: 1.0, avg: 23.1, max: 41.0) [2024-03-29 12:10:46,690][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:10:47,669][00501] Updated weights for policy 0, policy_version 1390 (0.0022) [2024-03-29 12:10:51,686][00126] Fps is (10 sec: 39320.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 22904832. Throughput: 0: 42353.7. Samples: 23276920. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:10:51,689][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:10:52,164][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001400_22937600.pth... [2024-03-29 12:10:52,176][00501] Updated weights for policy 0, policy_version 1400 (0.0028) [2024-03-29 12:10:52,521][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000000788_12910592.pth [2024-03-29 12:10:55,895][00501] Updated weights for policy 0, policy_version 1410 (0.0024) [2024-03-29 12:10:56,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 23150592. Throughput: 0: 42067.1. Samples: 23379920. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 12:10:56,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:10:59,191][00501] Updated weights for policy 0, policy_version 1420 (0.0029) [2024-03-29 12:11:01,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 23363584. Throughput: 0: 41618.6. Samples: 23614160. Policy #0 lag: (min: 1.0, avg: 23.4, max: 41.0) [2024-03-29 12:11:01,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:11:03,464][00501] Updated weights for policy 0, policy_version 1430 (0.0022) [2024-03-29 12:11:06,686][00126] Fps is (10 sec: 36044.6, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 23511040. Throughput: 0: 41807.1. Samples: 23898340. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:11:06,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:11:08,094][00501] Updated weights for policy 0, policy_version 1440 (0.0027) [2024-03-29 12:11:11,652][00501] Updated weights for policy 0, policy_version 1450 (0.0026) [2024-03-29 12:11:11,686][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 23756800. Throughput: 0: 41765.2. Samples: 24007640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 12:11:11,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:11:12,109][00481] Signal inference workers to stop experience collection... (900 times) [2024-03-29 12:11:12,160][00501] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-03-29 12:11:12,199][00481] Signal inference workers to resume experience collection... (900 times) [2024-03-29 12:11:12,203][00501] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-03-29 12:11:14,864][00501] Updated weights for policy 0, policy_version 1460 (0.0022) [2024-03-29 12:11:16,685][00126] Fps is (10 sec: 49153.0, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 24002560. Throughput: 0: 41873.0. Samples: 24241580. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:11:16,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:11:19,152][00501] Updated weights for policy 0, policy_version 1470 (0.0019) [2024-03-29 12:11:21,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 24133632. Throughput: 0: 42094.2. Samples: 24523700. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 12:11:21,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:11:23,772][00501] Updated weights for policy 0, policy_version 1480 (0.0029) [2024-03-29 12:11:26,685][00126] Fps is (10 sec: 37682.5, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 24379392. Throughput: 0: 42233.6. Samples: 24649560. Policy #0 lag: (min: 0.0, avg: 18.3, max: 43.0) [2024-03-29 12:11:26,689][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:11:27,356][00501] Updated weights for policy 0, policy_version 1490 (0.0026) [2024-03-29 12:11:30,615][00501] Updated weights for policy 0, policy_version 1500 (0.0034) [2024-03-29 12:11:31,685][00126] Fps is (10 sec: 47513.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 24608768. Throughput: 0: 41595.7. Samples: 24867800. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 12:11:31,686][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:11:34,652][00501] Updated weights for policy 0, policy_version 1510 (0.0019) [2024-03-29 12:11:36,686][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.0, 300 sec: 42043.0). Total num frames: 24788992. Throughput: 0: 41331.0. Samples: 25136820. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 12:11:36,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:11:39,478][00501] Updated weights for policy 0, policy_version 1520 (0.0028) [2024-03-29 12:11:41,686][00126] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 25001984. Throughput: 0: 41939.5. Samples: 25267200. Policy #0 lag: (min: 0.0, avg: 18.3, max: 40.0) [2024-03-29 12:11:41,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:11:42,916][00481] Signal inference workers to stop experience collection... (950 times) [2024-03-29 12:11:42,954][00501] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-03-29 12:11:43,143][00481] Signal inference workers to resume experience collection... (950 times) [2024-03-29 12:11:43,143][00501] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-03-29 12:11:43,146][00501] Updated weights for policy 0, policy_version 1530 (0.0020) [2024-03-29 12:11:46,387][00501] Updated weights for policy 0, policy_version 1540 (0.0023) [2024-03-29 12:11:46,685][00126] Fps is (10 sec: 45875.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 25247744. Throughput: 0: 41813.3. Samples: 25495760. Policy #0 lag: (min: 0.0, avg: 23.5, max: 44.0) [2024-03-29 12:11:46,687][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:11:50,487][00501] Updated weights for policy 0, policy_version 1550 (0.0024) [2024-03-29 12:11:51,685][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.3, 300 sec: 42098.5). Total num frames: 25411584. Throughput: 0: 41661.4. Samples: 25773100. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 12:11:51,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:11:51,836][00481] Saving new best policy, reward=0.008! [2024-03-29 12:11:55,105][00501] Updated weights for policy 0, policy_version 1560 (0.0025) [2024-03-29 12:11:56,685][00126] Fps is (10 sec: 39322.5, 60 sec: 41506.2, 300 sec: 41932.0). Total num frames: 25640960. Throughput: 0: 42194.0. Samples: 25906360. Policy #0 lag: (min: 1.0, avg: 20.1, max: 43.0) [2024-03-29 12:11:56,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:11:58,821][00501] Updated weights for policy 0, policy_version 1570 (0.0025) [2024-03-29 12:12:01,685][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 25853952. Throughput: 0: 42123.9. Samples: 26137160. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:12:01,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:02,136][00501] Updated weights for policy 0, policy_version 1580 (0.0018) [2024-03-29 12:12:06,144][00501] Updated weights for policy 0, policy_version 1590 (0.0021) [2024-03-29 12:12:06,685][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 26050560. Throughput: 0: 41668.0. Samples: 26398760. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:12:06,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:12:10,590][00501] Updated weights for policy 0, policy_version 1600 (0.0024) [2024-03-29 12:12:11,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 26263552. Throughput: 0: 42117.9. Samples: 26544860. Policy #0 lag: (min: 0.0, avg: 17.2, max: 41.0) [2024-03-29 12:12:11,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:14,123][00501] Updated weights for policy 0, policy_version 1610 (0.0023) [2024-03-29 12:12:15,724][00481] Signal inference workers to stop experience collection... (1000 times) [2024-03-29 12:12:15,761][00501] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-03-29 12:12:15,938][00481] Signal inference workers to resume experience collection... (1000 times) [2024-03-29 12:12:15,939][00501] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-03-29 12:12:16,686][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 26492928. Throughput: 0: 42445.6. Samples: 26777860. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 12:12:16,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:12:17,485][00501] Updated weights for policy 0, policy_version 1620 (0.0021) [2024-03-29 12:12:21,685][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 26689536. Throughput: 0: 42006.4. Samples: 27027100. Policy #0 lag: (min: 0.0, avg: 23.3, max: 41.0) [2024-03-29 12:12:21,687][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:12:21,692][00501] Updated weights for policy 0, policy_version 1630 (0.0022) [2024-03-29 12:12:26,085][00501] Updated weights for policy 0, policy_version 1640 (0.0026) [2024-03-29 12:12:26,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 26886144. Throughput: 0: 42382.3. Samples: 27174400. Policy #0 lag: (min: 0.0, avg: 18.6, max: 40.0) [2024-03-29 12:12:26,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:29,591][00501] Updated weights for policy 0, policy_version 1650 (0.0022) [2024-03-29 12:12:31,686][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 27131904. Throughput: 0: 42589.3. Samples: 27412280. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 12:12:31,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:12:32,816][00501] Updated weights for policy 0, policy_version 1660 (0.0020) [2024-03-29 12:12:36,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 27344896. Throughput: 0: 42001.2. Samples: 27663160. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 12:12:36,687][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:12:37,017][00501] Updated weights for policy 0, policy_version 1670 (0.0021) [2024-03-29 12:12:41,470][00501] Updated weights for policy 0, policy_version 1680 (0.0020) [2024-03-29 12:12:41,685][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 27525120. Throughput: 0: 42409.2. Samples: 27814780. Policy #0 lag: (min: 1.0, avg: 17.7, max: 41.0) [2024-03-29 12:12:41,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:12:44,995][00501] Updated weights for policy 0, policy_version 1690 (0.0019) [2024-03-29 12:12:46,685][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 27754496. Throughput: 0: 42348.0. Samples: 28042820. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 12:12:46,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:12:48,390][00501] Updated weights for policy 0, policy_version 1700 (0.0019) [2024-03-29 12:12:51,686][00126] Fps is (10 sec: 45874.7, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 27983872. Throughput: 0: 42194.6. Samples: 28297520. Policy #0 lag: (min: 0.0, avg: 23.9, max: 43.0) [2024-03-29 12:12:51,687][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:12:51,708][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001708_27983872.pth... [2024-03-29 12:12:52,045][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001093_17907712.pth [2024-03-29 12:12:52,654][00501] Updated weights for policy 0, policy_version 1710 (0.0028) [2024-03-29 12:12:55,154][00481] Signal inference workers to stop experience collection... (1050 times) [2024-03-29 12:12:55,225][00501] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-03-29 12:12:55,235][00481] Signal inference workers to resume experience collection... (1050 times) [2024-03-29 12:12:55,256][00501] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-03-29 12:12:56,687][00126] Fps is (10 sec: 39315.5, 60 sec: 41778.0, 300 sec: 42042.8). Total num frames: 28147712. Throughput: 0: 42139.9. Samples: 28441220. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 12:12:56,687][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:12:57,098][00501] Updated weights for policy 0, policy_version 1720 (0.0030) [2024-03-29 12:13:00,611][00501] Updated weights for policy 0, policy_version 1730 (0.0028) [2024-03-29 12:13:01,686][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 28393472. Throughput: 0: 42187.2. Samples: 28676280. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 12:13:01,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:13:04,108][00501] Updated weights for policy 0, policy_version 1740 (0.0028) [2024-03-29 12:13:06,685][00126] Fps is (10 sec: 47520.7, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 28622848. Throughput: 0: 41909.7. Samples: 28913040. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 12:13:06,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:13:08,551][00501] Updated weights for policy 0, policy_version 1750 (0.0021) [2024-03-29 12:13:11,685][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 28753920. Throughput: 0: 41964.5. Samples: 29062800. Policy #0 lag: (min: 0.0, avg: 16.8, max: 43.0) [2024-03-29 12:13:11,687][00126] Avg episode reward: [(0, '0.003')] [2024-03-29 12:13:12,965][00501] Updated weights for policy 0, policy_version 1760 (0.0021) [2024-03-29 12:13:16,505][00501] Updated weights for policy 0, policy_version 1770 (0.0020) [2024-03-29 12:13:16,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 28999680. Throughput: 0: 41828.6. Samples: 29294560. Policy #0 lag: (min: 2.0, avg: 20.7, max: 41.0) [2024-03-29 12:13:16,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:19,866][00501] Updated weights for policy 0, policy_version 1780 (0.0019) [2024-03-29 12:13:21,685][00126] Fps is (10 sec: 49151.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 29245440. Throughput: 0: 41549.8. Samples: 29532900. Policy #0 lag: (min: 1.0, avg: 24.3, max: 44.0) [2024-03-29 12:13:21,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:24,190][00501] Updated weights for policy 0, policy_version 1790 (0.0021) [2024-03-29 12:13:26,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 29376512. Throughput: 0: 41542.3. Samples: 29684180. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 12:13:26,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:26,767][00481] Signal inference workers to stop experience collection... (1100 times) [2024-03-29 12:13:26,808][00501] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-03-29 12:13:26,948][00481] Signal inference workers to resume experience collection... (1100 times) [2024-03-29 12:13:26,948][00501] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-03-29 12:13:28,557][00501] Updated weights for policy 0, policy_version 1800 (0.0020) [2024-03-29 12:13:31,686][00126] Fps is (10 sec: 37682.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 29622272. Throughput: 0: 41741.2. Samples: 29921180. Policy #0 lag: (min: 3.0, avg: 18.4, max: 43.0) [2024-03-29 12:13:31,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:13:32,132][00501] Updated weights for policy 0, policy_version 1810 (0.0023) [2024-03-29 12:13:35,455][00501] Updated weights for policy 0, policy_version 1820 (0.0031) [2024-03-29 12:13:36,686][00126] Fps is (10 sec: 50789.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 29884416. Throughput: 0: 41461.4. Samples: 30163280. Policy #0 lag: (min: 1.0, avg: 22.3, max: 42.0) [2024-03-29 12:13:36,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:39,866][00501] Updated weights for policy 0, policy_version 1830 (0.0020) [2024-03-29 12:13:41,685][00126] Fps is (10 sec: 40960.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 30031872. Throughput: 0: 41624.1. Samples: 30314240. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:13:41,687][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:13:44,199][00501] Updated weights for policy 0, policy_version 1840 (0.0026) [2024-03-29 12:13:46,685][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41987.4). Total num frames: 30277632. Throughput: 0: 41846.3. Samples: 30559360. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:13:46,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:13:47,473][00501] Updated weights for policy 0, policy_version 1850 (0.0025) [2024-03-29 12:13:51,036][00501] Updated weights for policy 0, policy_version 1860 (0.0025) [2024-03-29 12:13:51,685][00126] Fps is (10 sec: 47513.7, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 30507008. Throughput: 0: 42045.4. Samples: 30805080. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:13:51,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:13:55,519][00501] Updated weights for policy 0, policy_version 1870 (0.0017) [2024-03-29 12:13:56,685][00126] Fps is (10 sec: 40960.3, 60 sec: 42326.4, 300 sec: 42154.1). Total num frames: 30687232. Throughput: 0: 41704.9. Samples: 30939520. Policy #0 lag: (min: 0.0, avg: 19.8, max: 42.0) [2024-03-29 12:13:56,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:13:58,204][00481] Signal inference workers to stop experience collection... (1150 times) [2024-03-29 12:13:58,246][00501] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-03-29 12:13:58,283][00481] Signal inference workers to resume experience collection... (1150 times) [2024-03-29 12:13:58,286][00501] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-03-29 12:13:59,966][00501] Updated weights for policy 0, policy_version 1880 (0.0028) [2024-03-29 12:14:01,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 30900224. Throughput: 0: 42174.6. Samples: 31192420. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 12:14:01,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:14:01,705][00481] Saving new best policy, reward=0.011! [2024-03-29 12:14:03,269][00501] Updated weights for policy 0, policy_version 1890 (0.0019) [2024-03-29 12:14:06,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 31113216. Throughput: 0: 42218.2. Samples: 31432720. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 12:14:06,688][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:14:06,878][00501] Updated weights for policy 0, policy_version 1900 (0.0023) [2024-03-29 12:14:11,564][00501] Updated weights for policy 0, policy_version 1910 (0.0019) [2024-03-29 12:14:11,685][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 31293440. Throughput: 0: 41447.1. Samples: 31549300. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:14:11,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:14:15,857][00501] Updated weights for policy 0, policy_version 1920 (0.0022) [2024-03-29 12:14:16,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 31506432. Throughput: 0: 42206.8. Samples: 31820480. Policy #0 lag: (min: 0.0, avg: 15.2, max: 41.0) [2024-03-29 12:14:16,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:14:19,029][00501] Updated weights for policy 0, policy_version 1930 (0.0023) [2024-03-29 12:14:21,685][00126] Fps is (10 sec: 44237.0, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 31735808. Throughput: 0: 41889.9. Samples: 32048320. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 12:14:21,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:14:22,572][00501] Updated weights for policy 0, policy_version 1940 (0.0021) [2024-03-29 12:14:26,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 31932416. Throughput: 0: 41178.2. Samples: 32167260. Policy #0 lag: (min: 0.0, avg: 23.0, max: 41.0) [2024-03-29 12:14:26,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:14:27,035][00501] Updated weights for policy 0, policy_version 1950 (0.0018) [2024-03-29 12:14:29,677][00481] Signal inference workers to stop experience collection... (1200 times) [2024-03-29 12:14:29,720][00501] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-03-29 12:14:29,760][00481] Signal inference workers to resume experience collection... (1200 times) [2024-03-29 12:14:29,761][00501] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-03-29 12:14:31,371][00501] Updated weights for policy 0, policy_version 1960 (0.0024) [2024-03-29 12:14:31,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 32129024. Throughput: 0: 42334.3. Samples: 32464400. Policy #0 lag: (min: 0.0, avg: 16.1, max: 41.0) [2024-03-29 12:14:31,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:14:34,618][00501] Updated weights for policy 0, policy_version 1970 (0.0029) [2024-03-29 12:14:36,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.2, 300 sec: 41932.0). Total num frames: 32374784. Throughput: 0: 41803.1. Samples: 32686220. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:14:36,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:14:38,167][00501] Updated weights for policy 0, policy_version 1980 (0.0021) [2024-03-29 12:14:41,685][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 32555008. Throughput: 0: 41270.1. Samples: 32796680. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 12:14:41,688][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:14:42,723][00501] Updated weights for policy 0, policy_version 1990 (0.0019) [2024-03-29 12:14:46,685][00126] Fps is (10 sec: 36044.9, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 32735232. Throughput: 0: 42358.4. Samples: 33098540. Policy #0 lag: (min: 0.0, avg: 16.2, max: 41.0) [2024-03-29 12:14:46,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:14:46,956][00501] Updated weights for policy 0, policy_version 2000 (0.0024) [2024-03-29 12:14:50,227][00501] Updated weights for policy 0, policy_version 2010 (0.0024) [2024-03-29 12:14:51,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 32980992. Throughput: 0: 41786.1. Samples: 33313100. Policy #0 lag: (min: 0.0, avg: 22.5, max: 43.0) [2024-03-29 12:14:51,686][00126] Avg episode reward: [(0, '0.002')] [2024-03-29 12:14:51,918][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002014_32997376.pth... [2024-03-29 12:14:52,312][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001400_22937600.pth [2024-03-29 12:14:54,157][00501] Updated weights for policy 0, policy_version 2020 (0.0019) [2024-03-29 12:14:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 33193984. Throughput: 0: 41580.5. Samples: 33420420. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:14:56,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:14:58,846][00501] Updated weights for policy 0, policy_version 2030 (0.0027) [2024-03-29 12:15:01,583][00481] Signal inference workers to stop experience collection... (1250 times) [2024-03-29 12:15:01,584][00481] Signal inference workers to resume experience collection... (1250 times) [2024-03-29 12:15:01,628][00501] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-03-29 12:15:01,633][00501] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-03-29 12:15:01,685][00126] Fps is (10 sec: 34406.6, 60 sec: 40413.9, 300 sec: 41709.8). Total num frames: 33325056. Throughput: 0: 41577.3. Samples: 33691460. Policy #0 lag: (min: 0.0, avg: 15.6, max: 41.0) [2024-03-29 12:15:01,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:03,196][00501] Updated weights for policy 0, policy_version 2040 (0.0026) [2024-03-29 12:15:06,453][00501] Updated weights for policy 0, policy_version 2050 (0.0028) [2024-03-29 12:15:06,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 33587200. Throughput: 0: 41612.8. Samples: 33920900. Policy #0 lag: (min: 0.0, avg: 22.4, max: 45.0) [2024-03-29 12:15:06,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:15:09,963][00501] Updated weights for policy 0, policy_version 2060 (0.0023) [2024-03-29 12:15:11,685][00126] Fps is (10 sec: 52428.8, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 33849344. Throughput: 0: 41825.8. Samples: 34049420. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 12:15:11,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:15:14,551][00501] Updated weights for policy 0, policy_version 2070 (0.0021) [2024-03-29 12:15:16,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 33964032. Throughput: 0: 40810.6. Samples: 34300880. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 12:15:16,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:15:19,167][00501] Updated weights for policy 0, policy_version 2080 (0.0031) [2024-03-29 12:15:21,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 34209792. Throughput: 0: 41089.3. Samples: 34535240. Policy #0 lag: (min: 0.0, avg: 22.0, max: 45.0) [2024-03-29 12:15:21,686][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:15:21,704][00481] Saving new best policy, reward=0.014! [2024-03-29 12:15:22,606][00501] Updated weights for policy 0, policy_version 2090 (0.0021) [2024-03-29 12:15:25,912][00501] Updated weights for policy 0, policy_version 2100 (0.0027) [2024-03-29 12:15:26,685][00126] Fps is (10 sec: 47514.3, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 34439168. Throughput: 0: 41225.0. Samples: 34651800. Policy #0 lag: (min: 2.0, avg: 22.1, max: 42.0) [2024-03-29 12:15:26,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:15:30,089][00481] Signal inference workers to stop experience collection... (1300 times) [2024-03-29 12:15:30,148][00501] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-03-29 12:15:30,247][00481] Signal inference workers to resume experience collection... (1300 times) [2024-03-29 12:15:30,248][00501] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-03-29 12:15:30,506][00501] Updated weights for policy 0, policy_version 2110 (0.0021) [2024-03-29 12:15:31,686][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 34603008. Throughput: 0: 40270.5. Samples: 34910720. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 12:15:31,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:15:35,128][00501] Updated weights for policy 0, policy_version 2120 (0.0022) [2024-03-29 12:15:36,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41709.8). Total num frames: 34816000. Throughput: 0: 40977.9. Samples: 35157100. Policy #0 lag: (min: 0.0, avg: 16.4, max: 41.0) [2024-03-29 12:15:36,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:15:38,378][00501] Updated weights for policy 0, policy_version 2130 (0.0022) [2024-03-29 12:15:41,685][00126] Fps is (10 sec: 44237.5, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 35045376. Throughput: 0: 41348.4. Samples: 35281100. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 12:15:41,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:41,779][00501] Updated weights for policy 0, policy_version 2140 (0.0016) [2024-03-29 12:15:46,248][00501] Updated weights for policy 0, policy_version 2150 (0.0017) [2024-03-29 12:15:46,685][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 35241984. Throughput: 0: 40880.0. Samples: 35531060. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 12:15:46,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:50,889][00501] Updated weights for policy 0, policy_version 2160 (0.0031) [2024-03-29 12:15:51,685][00126] Fps is (10 sec: 37683.1, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 35422208. Throughput: 0: 41535.6. Samples: 35790000. Policy #0 lag: (min: 0.0, avg: 16.4, max: 41.0) [2024-03-29 12:15:51,686][00126] Avg episode reward: [(0, '0.006')] [2024-03-29 12:15:54,106][00501] Updated weights for policy 0, policy_version 2170 (0.0027) [2024-03-29 12:15:56,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 35667968. Throughput: 0: 41256.9. Samples: 35905980. Policy #0 lag: (min: 1.0, avg: 22.2, max: 45.0) [2024-03-29 12:15:56,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:15:57,752][00501] Updated weights for policy 0, policy_version 2180 (0.0020) [2024-03-29 12:16:01,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 35848192. Throughput: 0: 40705.0. Samples: 36132600. Policy #0 lag: (min: 0.0, avg: 22.7, max: 44.0) [2024-03-29 12:16:01,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:16:02,168][00481] Signal inference workers to stop experience collection... (1350 times) [2024-03-29 12:16:02,219][00501] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-03-29 12:16:02,252][00481] Signal inference workers to resume experience collection... (1350 times) [2024-03-29 12:16:02,256][00501] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-03-29 12:16:02,258][00501] Updated weights for policy 0, policy_version 2190 (0.0015) [2024-03-29 12:16:06,685][00126] Fps is (10 sec: 36045.1, 60 sec: 40687.1, 300 sec: 41598.7). Total num frames: 36028416. Throughput: 0: 41900.6. Samples: 36420760. Policy #0 lag: (min: 0.0, avg: 15.4, max: 41.0) [2024-03-29 12:16:06,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:16:06,971][00501] Updated weights for policy 0, policy_version 2200 (0.0019) [2024-03-29 12:16:09,831][00501] Updated weights for policy 0, policy_version 2210 (0.0023) [2024-03-29 12:16:11,685][00126] Fps is (10 sec: 42598.4, 60 sec: 40413.9, 300 sec: 41598.7). Total num frames: 36274176. Throughput: 0: 41476.9. Samples: 36518260. Policy #0 lag: (min: 0.0, avg: 22.1, max: 44.0) [2024-03-29 12:16:11,686][00126] Avg episode reward: [(0, '0.005')] [2024-03-29 12:16:13,707][00501] Updated weights for policy 0, policy_version 2220 (0.0028) [2024-03-29 12:16:16,686][00126] Fps is (10 sec: 44235.6, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 36470784. Throughput: 0: 40548.9. Samples: 36735420. Policy #0 lag: (min: 1.0, avg: 23.5, max: 44.0) [2024-03-29 12:16:16,687][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:16:18,198][00501] Updated weights for policy 0, policy_version 2230 (0.0032) [2024-03-29 12:16:21,686][00126] Fps is (10 sec: 34405.8, 60 sec: 40140.7, 300 sec: 41487.6). Total num frames: 36618240. Throughput: 0: 41870.0. Samples: 37041260. Policy #0 lag: (min: 0.0, avg: 15.4, max: 41.0) [2024-03-29 12:16:21,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:16:22,907][00501] Updated weights for policy 0, policy_version 2240 (0.0023) [2024-03-29 12:16:25,763][00501] Updated weights for policy 0, policy_version 2250 (0.0016) [2024-03-29 12:16:26,685][00126] Fps is (10 sec: 42599.1, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 36896768. Throughput: 0: 41264.0. Samples: 37137980. Policy #0 lag: (min: 1.0, avg: 21.4, max: 43.0) [2024-03-29 12:16:26,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:16:29,387][00501] Updated weights for policy 0, policy_version 2260 (0.0027) [2024-03-29 12:16:30,136][00481] Signal inference workers to stop experience collection... (1400 times) [2024-03-29 12:16:30,175][00501] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-03-29 12:16:30,358][00481] Signal inference workers to resume experience collection... (1400 times) [2024-03-29 12:16:30,358][00501] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-03-29 12:16:31,686][00126] Fps is (10 sec: 50790.4, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 37126144. Throughput: 0: 40653.2. Samples: 37360460. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 12:16:31,686][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:16:33,882][00501] Updated weights for policy 0, policy_version 2270 (0.0033) [2024-03-29 12:16:36,685][00126] Fps is (10 sec: 34406.2, 60 sec: 40413.8, 300 sec: 41487.6). Total num frames: 37240832. Throughput: 0: 41510.6. Samples: 37657980. Policy #0 lag: (min: 0.0, avg: 23.6, max: 45.0) [2024-03-29 12:16:36,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:16:38,706][00501] Updated weights for policy 0, policy_version 2280 (0.0022) [2024-03-29 12:16:41,664][00501] Updated weights for policy 0, policy_version 2290 (0.0018) [2024-03-29 12:16:41,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 37519360. Throughput: 0: 41572.8. Samples: 37776760. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 12:16:41,686][00126] Avg episode reward: [(0, '0.018')] [2024-03-29 12:16:41,714][00481] Saving new best policy, reward=0.018! [2024-03-29 12:16:45,653][00501] Updated weights for policy 0, policy_version 2300 (0.0032) [2024-03-29 12:16:46,685][00126] Fps is (10 sec: 50790.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 37748736. Throughput: 0: 41302.2. Samples: 37991200. Policy #0 lag: (min: 0.0, avg: 23.3, max: 43.0) [2024-03-29 12:16:46,687][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:16:49,912][00501] Updated weights for policy 0, policy_version 2310 (0.0018) [2024-03-29 12:16:51,685][00126] Fps is (10 sec: 36044.7, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 37879808. Throughput: 0: 40847.8. Samples: 38258920. Policy #0 lag: (min: 0.0, avg: 23.8, max: 42.0) [2024-03-29 12:16:51,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:16:51,929][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002313_37896192.pth... [2024-03-29 12:16:52,317][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000001708_27983872.pth [2024-03-29 12:16:54,750][00501] Updated weights for policy 0, policy_version 2320 (0.0034) [2024-03-29 12:16:56,685][00126] Fps is (10 sec: 37682.8, 60 sec: 40959.9, 300 sec: 41598.7). Total num frames: 38125568. Throughput: 0: 41673.2. Samples: 38393560. Policy #0 lag: (min: 2.0, avg: 18.5, max: 42.0) [2024-03-29 12:16:56,686][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:16:57,519][00501] Updated weights for policy 0, policy_version 2330 (0.0022) [2024-03-29 12:16:58,720][00481] Signal inference workers to stop experience collection... (1450 times) [2024-03-29 12:16:58,803][00501] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-03-29 12:16:58,815][00481] Signal inference workers to resume experience collection... (1450 times) [2024-03-29 12:16:58,829][00501] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-03-29 12:17:01,441][00501] Updated weights for policy 0, policy_version 2340 (0.0031) [2024-03-29 12:17:01,686][00126] Fps is (10 sec: 45875.0, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 38338560. Throughput: 0: 41530.2. Samples: 38604280. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 12:17:01,686][00126] Avg episode reward: [(0, '0.008')] [2024-03-29 12:17:05,627][00501] Updated weights for policy 0, policy_version 2350 (0.0018) [2024-03-29 12:17:06,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 38518784. Throughput: 0: 40543.2. Samples: 38865700. Policy #0 lag: (min: 2.0, avg: 25.0, max: 44.0) [2024-03-29 12:17:06,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:10,824][00501] Updated weights for policy 0, policy_version 2360 (0.0030) [2024-03-29 12:17:11,685][00126] Fps is (10 sec: 37683.5, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 38715392. Throughput: 0: 41887.5. Samples: 39022920. Policy #0 lag: (min: 2.0, avg: 16.6, max: 40.0) [2024-03-29 12:17:11,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:13,796][00501] Updated weights for policy 0, policy_version 2370 (0.0024) [2024-03-29 12:17:16,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 38961152. Throughput: 0: 41190.4. Samples: 39214020. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 12:17:16,686][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:17,358][00501] Updated weights for policy 0, policy_version 2380 (0.0024) [2024-03-29 12:17:21,629][00501] Updated weights for policy 0, policy_version 2390 (0.0021) [2024-03-29 12:17:21,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41598.7). Total num frames: 39157760. Throughput: 0: 40397.3. Samples: 39475860. Policy #0 lag: (min: 1.0, avg: 23.6, max: 41.0) [2024-03-29 12:17:21,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:17:26,685][00126] Fps is (10 sec: 34406.4, 60 sec: 40140.8, 300 sec: 41265.5). Total num frames: 39305216. Throughput: 0: 41221.0. Samples: 39631700. Policy #0 lag: (min: 1.0, avg: 14.9, max: 41.0) [2024-03-29 12:17:26,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:17:26,731][00501] Updated weights for policy 0, policy_version 2400 (0.0019) [2024-03-29 12:17:29,948][00501] Updated weights for policy 0, policy_version 2410 (0.0031) [2024-03-29 12:17:29,965][00481] Signal inference workers to stop experience collection... (1500 times) [2024-03-29 12:17:30,005][00501] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-03-29 12:17:30,191][00481] Signal inference workers to resume experience collection... (1500 times) [2024-03-29 12:17:30,191][00501] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-03-29 12:17:31,686][00126] Fps is (10 sec: 40959.9, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 39567360. Throughput: 0: 40898.5. Samples: 39831640. Policy #0 lag: (min: 2.0, avg: 22.2, max: 42.0) [2024-03-29 12:17:31,688][00126] Avg episode reward: [(0, '0.007')] [2024-03-29 12:17:33,402][00501] Updated weights for policy 0, policy_version 2420 (0.0022) [2024-03-29 12:17:36,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41487.6). Total num frames: 39763968. Throughput: 0: 40578.3. Samples: 40084940. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 12:17:36,686][00126] Avg episode reward: [(0, '0.004')] [2024-03-29 12:17:37,500][00501] Updated weights for policy 0, policy_version 2430 (0.0022) [2024-03-29 12:17:41,685][00126] Fps is (10 sec: 32768.3, 60 sec: 39594.7, 300 sec: 41154.4). Total num frames: 39895040. Throughput: 0: 40681.0. Samples: 40224200. Policy #0 lag: (min: 0.0, avg: 15.2, max: 42.0) [2024-03-29 12:17:41,687][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:17:42,943][00501] Updated weights for policy 0, policy_version 2440 (0.0027) [2024-03-29 12:17:45,846][00501] Updated weights for policy 0, policy_version 2450 (0.0035) [2024-03-29 12:17:46,685][00126] Fps is (10 sec: 40960.3, 60 sec: 40413.9, 300 sec: 41321.0). Total num frames: 40173568. Throughput: 0: 40994.9. Samples: 40449040. Policy #0 lag: (min: 2.0, avg: 20.7, max: 45.0) [2024-03-29 12:17:46,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:17:49,222][00501] Updated weights for policy 0, policy_version 2460 (0.0018) [2024-03-29 12:17:51,685][00126] Fps is (10 sec: 50790.7, 60 sec: 42052.4, 300 sec: 41543.4). Total num frames: 40402944. Throughput: 0: 40648.9. Samples: 40694900. Policy #0 lag: (min: 0.0, avg: 22.9, max: 41.0) [2024-03-29 12:17:51,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:17:53,178][00501] Updated weights for policy 0, policy_version 2470 (0.0019) [2024-03-29 12:17:56,685][00126] Fps is (10 sec: 34406.0, 60 sec: 39867.8, 300 sec: 41098.9). Total num frames: 40517632. Throughput: 0: 40411.6. Samples: 40841440. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 12:17:56,687][00126] Avg episode reward: [(0, '0.012')] [2024-03-29 12:17:58,892][00501] Updated weights for policy 0, policy_version 2480 (0.0027) [2024-03-29 12:18:01,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.1, 300 sec: 41209.9). Total num frames: 40779776. Throughput: 0: 41467.6. Samples: 41080060. Policy #0 lag: (min: 3.0, avg: 20.9, max: 44.0) [2024-03-29 12:18:01,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:18:01,850][00501] Updated weights for policy 0, policy_version 2490 (0.0024) [2024-03-29 12:18:02,166][00481] Signal inference workers to stop experience collection... (1550 times) [2024-03-29 12:18:02,167][00481] Signal inference workers to resume experience collection... (1550 times) [2024-03-29 12:18:02,208][00501] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-03-29 12:18:02,208][00501] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-03-29 12:18:05,311][00501] Updated weights for policy 0, policy_version 2500 (0.0020) [2024-03-29 12:18:06,685][00126] Fps is (10 sec: 50790.8, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 41025536. Throughput: 0: 40771.2. Samples: 41310560. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 12:18:06,686][00126] Avg episode reward: [(0, '0.018')] [2024-03-29 12:18:09,451][00501] Updated weights for policy 0, policy_version 2510 (0.0018) [2024-03-29 12:18:11,685][00126] Fps is (10 sec: 39321.4, 60 sec: 40960.0, 300 sec: 41265.5). Total num frames: 41172992. Throughput: 0: 40329.3. Samples: 41446520. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 12:18:11,687][00126] Avg episode reward: [(0, '0.023')] [2024-03-29 12:18:11,703][00481] Saving new best policy, reward=0.023! [2024-03-29 12:18:14,706][00501] Updated weights for policy 0, policy_version 2520 (0.0024) [2024-03-29 12:18:16,685][00126] Fps is (10 sec: 37683.4, 60 sec: 40687.0, 300 sec: 41209.9). Total num frames: 41402368. Throughput: 0: 41706.9. Samples: 41708440. Policy #0 lag: (min: 0.0, avg: 14.4, max: 41.0) [2024-03-29 12:18:16,686][00126] Avg episode reward: [(0, '0.012')] [2024-03-29 12:18:17,599][00501] Updated weights for policy 0, policy_version 2530 (0.0023) [2024-03-29 12:18:21,275][00501] Updated weights for policy 0, policy_version 2540 (0.0020) [2024-03-29 12:18:21,685][00126] Fps is (10 sec: 45875.6, 60 sec: 41233.2, 300 sec: 41543.2). Total num frames: 41631744. Throughput: 0: 40976.5. Samples: 41928880. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:18:21,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:18:25,556][00501] Updated weights for policy 0, policy_version 2550 (0.0020) [2024-03-29 12:18:26,685][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41321.0). Total num frames: 41811968. Throughput: 0: 40610.7. Samples: 42051680. Policy #0 lag: (min: 1.0, avg: 22.8, max: 41.0) [2024-03-29 12:18:26,686][00126] Avg episode reward: [(0, '0.015')] [2024-03-29 12:18:30,762][00501] Updated weights for policy 0, policy_version 2560 (0.0030) [2024-03-29 12:18:31,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40414.0, 300 sec: 41043.3). Total num frames: 41992192. Throughput: 0: 41969.3. Samples: 42337660. Policy #0 lag: (min: 0.0, avg: 14.5, max: 41.0) [2024-03-29 12:18:31,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:18:32,234][00481] Signal inference workers to stop experience collection... (1600 times) [2024-03-29 12:18:32,291][00501] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-03-29 12:18:32,323][00481] Signal inference workers to resume experience collection... (1600 times) [2024-03-29 12:18:32,327][00501] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-03-29 12:18:33,415][00501] Updated weights for policy 0, policy_version 2570 (0.0028) [2024-03-29 12:18:36,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41376.5). Total num frames: 42237952. Throughput: 0: 41224.9. Samples: 42550020. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:18:36,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:18:37,225][00501] Updated weights for policy 0, policy_version 2580 (0.0030) [2024-03-29 12:18:41,365][00501] Updated weights for policy 0, policy_version 2590 (0.0026) [2024-03-29 12:18:41,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41209.9). Total num frames: 42434560. Throughput: 0: 40511.2. Samples: 42664440. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 12:18:41,686][00126] Avg episode reward: [(0, '0.014')] [2024-03-29 12:18:46,534][00501] Updated weights for policy 0, policy_version 2600 (0.0030) [2024-03-29 12:18:46,685][00126] Fps is (10 sec: 36044.5, 60 sec: 40413.8, 300 sec: 40987.8). Total num frames: 42598400. Throughput: 0: 41967.0. Samples: 42968580. Policy #0 lag: (min: 0.0, avg: 15.3, max: 41.0) [2024-03-29 12:18:46,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:18:49,304][00501] Updated weights for policy 0, policy_version 2610 (0.0027) [2024-03-29 12:18:51,685][00126] Fps is (10 sec: 42598.0, 60 sec: 40959.9, 300 sec: 41265.5). Total num frames: 42860544. Throughput: 0: 41059.5. Samples: 43158240. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 12:18:51,686][00126] Avg episode reward: [(0, '0.009')] [2024-03-29 12:18:51,709][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002616_42860544.pth... [2024-03-29 12:18:52,175][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002014_32997376.pth [2024-03-29 12:18:53,227][00501] Updated weights for policy 0, policy_version 2620 (0.0025) [2024-03-29 12:18:56,685][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41209.9). Total num frames: 43057152. Throughput: 0: 40713.4. Samples: 43278620. Policy #0 lag: (min: 0.0, avg: 23.7, max: 42.0) [2024-03-29 12:18:56,688][00126] Avg episode reward: [(0, '0.011')] [2024-03-29 12:18:57,208][00501] Updated weights for policy 0, policy_version 2630 (0.0017) [2024-03-29 12:19:01,685][00126] Fps is (10 sec: 32768.5, 60 sec: 40140.8, 300 sec: 40932.2). Total num frames: 43188224. Throughput: 0: 41459.6. Samples: 43574120. Policy #0 lag: (min: 0.0, avg: 16.1, max: 41.0) [2024-03-29 12:19:01,686][00126] Avg episode reward: [(0, '0.012')] [2024-03-29 12:19:02,567][00501] Updated weights for policy 0, policy_version 2640 (0.0026) [2024-03-29 12:19:03,367][00481] Signal inference workers to stop experience collection... (1650 times) [2024-03-29 12:19:03,368][00481] Signal inference workers to resume experience collection... (1650 times) [2024-03-29 12:19:03,408][00501] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-03-29 12:19:03,408][00501] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-03-29 12:19:05,286][00501] Updated weights for policy 0, policy_version 2650 (0.0019) [2024-03-29 12:19:06,685][00126] Fps is (10 sec: 42598.3, 60 sec: 40960.0, 300 sec: 41321.0). Total num frames: 43483136. Throughput: 0: 41155.5. Samples: 43780880. Policy #0 lag: (min: 1.0, avg: 20.9, max: 44.0) [2024-03-29 12:19:06,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:19:09,083][00501] Updated weights for policy 0, policy_version 2660 (0.0033) [2024-03-29 12:19:11,685][00126] Fps is (10 sec: 50790.0, 60 sec: 42052.3, 300 sec: 41321.0). Total num frames: 43696128. Throughput: 0: 41079.1. Samples: 43900240. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 12:19:11,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:19:12,906][00501] Updated weights for policy 0, policy_version 2670 (0.0036) [2024-03-29 12:19:16,685][00126] Fps is (10 sec: 34406.4, 60 sec: 40413.8, 300 sec: 40987.8). Total num frames: 43827200. Throughput: 0: 41213.8. Samples: 44192280. Policy #0 lag: (min: 0.0, avg: 15.2, max: 41.0) [2024-03-29 12:19:16,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:19:18,211][00501] Updated weights for policy 0, policy_version 2680 (0.0022) [2024-03-29 12:19:20,994][00501] Updated weights for policy 0, policy_version 2690 (0.0028) [2024-03-29 12:19:21,685][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.0, 300 sec: 41209.9). Total num frames: 44089344. Throughput: 0: 41375.5. Samples: 44411920. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:19:21,686][00126] Avg episode reward: [(0, '0.033')] [2024-03-29 12:19:22,084][00481] Saving new best policy, reward=0.033! [2024-03-29 12:19:24,925][00501] Updated weights for policy 0, policy_version 2700 (0.0021) [2024-03-29 12:19:26,685][00126] Fps is (10 sec: 47513.1, 60 sec: 41506.1, 300 sec: 41265.5). Total num frames: 44302336. Throughput: 0: 41488.8. Samples: 44531440. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 12:19:26,688][00126] Avg episode reward: [(0, '0.021')] [2024-03-29 12:19:28,803][00501] Updated weights for policy 0, policy_version 2710 (0.0023) [2024-03-29 12:19:31,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 40932.2). Total num frames: 44449792. Throughput: 0: 40711.6. Samples: 44800600. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 12:19:31,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:19:33,948][00501] Updated weights for policy 0, policy_version 2720 (0.0024) [2024-03-29 12:19:34,556][00481] Signal inference workers to stop experience collection... (1700 times) [2024-03-29 12:19:34,557][00481] Signal inference workers to resume experience collection... (1700 times) [2024-03-29 12:19:34,605][00501] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-03-29 12:19:34,605][00501] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-03-29 12:19:36,686][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.0, 300 sec: 41209.9). Total num frames: 44711936. Throughput: 0: 41708.8. Samples: 45035140. Policy #0 lag: (min: 1.0, avg: 22.2, max: 44.0) [2024-03-29 12:19:36,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:19:36,714][00501] Updated weights for policy 0, policy_version 2730 (0.0024) [2024-03-29 12:19:40,608][00501] Updated weights for policy 0, policy_version 2740 (0.0022) [2024-03-29 12:19:41,685][00126] Fps is (10 sec: 49152.0, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 44941312. Throughput: 0: 41723.1. Samples: 45156160. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 12:19:41,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:19:44,360][00501] Updated weights for policy 0, policy_version 2750 (0.0023) [2024-03-29 12:19:46,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.2, 300 sec: 41043.3). Total num frames: 45088768. Throughput: 0: 40903.0. Samples: 45414760. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 12:19:46,687][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:19:49,672][00501] Updated weights for policy 0, policy_version 2760 (0.0024) [2024-03-29 12:19:51,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.1, 300 sec: 41154.4). Total num frames: 45334528. Throughput: 0: 41891.9. Samples: 45666020. Policy #0 lag: (min: 1.0, avg: 15.6, max: 41.0) [2024-03-29 12:19:51,686][00126] Avg episode reward: [(0, '0.022')] [2024-03-29 12:19:52,645][00501] Updated weights for policy 0, policy_version 2770 (0.0026) [2024-03-29 12:19:56,483][00501] Updated weights for policy 0, policy_version 2780 (0.0032) [2024-03-29 12:19:56,685][00126] Fps is (10 sec: 45875.5, 60 sec: 41506.2, 300 sec: 41432.1). Total num frames: 45547520. Throughput: 0: 41668.5. Samples: 45775320. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 12:19:56,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:20:00,101][00501] Updated weights for policy 0, policy_version 2790 (0.0025) [2024-03-29 12:20:01,685][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.3, 300 sec: 41209.9). Total num frames: 45744128. Throughput: 0: 40805.8. Samples: 46028540. Policy #0 lag: (min: 0.0, avg: 22.0, max: 40.0) [2024-03-29 12:20:01,688][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:20:05,556][00501] Updated weights for policy 0, policy_version 2800 (0.0019) [2024-03-29 12:20:05,662][00481] Signal inference workers to stop experience collection... (1750 times) [2024-03-29 12:20:05,699][00501] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-03-29 12:20:05,826][00481] Signal inference workers to resume experience collection... (1750 times) [2024-03-29 12:20:05,826][00501] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-03-29 12:20:06,685][00126] Fps is (10 sec: 39321.1, 60 sec: 40960.0, 300 sec: 40987.8). Total num frames: 45940736. Throughput: 0: 41914.6. Samples: 46298080. Policy #0 lag: (min: 0.0, avg: 16.3, max: 42.0) [2024-03-29 12:20:06,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:20:08,340][00501] Updated weights for policy 0, policy_version 2810 (0.0032) [2024-03-29 12:20:11,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 46170112. Throughput: 0: 41438.3. Samples: 46396160. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:20:11,686][00126] Avg episode reward: [(0, '0.018')] [2024-03-29 12:20:12,312][00501] Updated weights for policy 0, policy_version 2820 (0.0021) [2024-03-29 12:20:15,971][00501] Updated weights for policy 0, policy_version 2830 (0.0023) [2024-03-29 12:20:16,685][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 41265.5). Total num frames: 46383104. Throughput: 0: 41209.8. Samples: 46655040. Policy #0 lag: (min: 1.0, avg: 23.3, max: 43.0) [2024-03-29 12:20:16,686][00126] Avg episode reward: [(0, '0.030')] [2024-03-29 12:20:21,139][00501] Updated weights for policy 0, policy_version 2840 (0.0025) [2024-03-29 12:20:21,685][00126] Fps is (10 sec: 37683.3, 60 sec: 40960.0, 300 sec: 41043.3). Total num frames: 46546944. Throughput: 0: 42125.9. Samples: 46930800. Policy #0 lag: (min: 1.0, avg: 16.5, max: 41.0) [2024-03-29 12:20:21,686][00126] Avg episode reward: [(0, '0.026')] [2024-03-29 12:20:23,929][00501] Updated weights for policy 0, policy_version 2850 (0.0024) [2024-03-29 12:20:26,686][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 46809088. Throughput: 0: 41706.6. Samples: 47032960. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 12:20:26,686][00126] Avg episode reward: [(0, '0.017')] [2024-03-29 12:20:27,767][00501] Updated weights for policy 0, policy_version 2860 (0.0022) [2024-03-29 12:20:31,685][00126] Fps is (10 sec: 45875.6, 60 sec: 42598.5, 300 sec: 41321.0). Total num frames: 47005696. Throughput: 0: 41609.4. Samples: 47287180. Policy #0 lag: (min: 0.0, avg: 22.9, max: 44.0) [2024-03-29 12:20:31,686][00126] Avg episode reward: [(0, '0.010')] [2024-03-29 12:20:31,773][00501] Updated weights for policy 0, policy_version 2870 (0.0020) [2024-03-29 12:20:36,215][00481] Signal inference workers to stop experience collection... (1800 times) [2024-03-29 12:20:36,250][00501] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-03-29 12:20:36,430][00481] Signal inference workers to resume experience collection... (1800 times) [2024-03-29 12:20:36,430][00501] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-03-29 12:20:36,686][00126] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 41098.8). Total num frames: 47169536. Throughput: 0: 42324.4. Samples: 47570620. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 12:20:36,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:20:36,731][00501] Updated weights for policy 0, policy_version 2880 (0.0019) [2024-03-29 12:20:39,511][00501] Updated weights for policy 0, policy_version 2890 (0.0022) [2024-03-29 12:20:41,685][00126] Fps is (10 sec: 44235.9, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 47448064. Throughput: 0: 42112.7. Samples: 47670400. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 12:20:41,686][00126] Avg episode reward: [(0, '0.025')] [2024-03-29 12:20:43,334][00501] Updated weights for policy 0, policy_version 2900 (0.0027) [2024-03-29 12:20:46,685][00126] Fps is (10 sec: 49152.6, 60 sec: 42871.5, 300 sec: 41487.6). Total num frames: 47661056. Throughput: 0: 42069.8. Samples: 47921680. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 12:20:46,687][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:20:47,333][00501] Updated weights for policy 0, policy_version 2910 (0.0033) [2024-03-29 12:20:51,686][00126] Fps is (10 sec: 36044.5, 60 sec: 41233.0, 300 sec: 41154.4). Total num frames: 47808512. Throughput: 0: 42390.1. Samples: 48205640. Policy #0 lag: (min: 1.0, avg: 17.2, max: 41.0) [2024-03-29 12:20:51,686][00126] Avg episode reward: [(0, '0.032')] [2024-03-29 12:20:51,876][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002919_47824896.pth... [2024-03-29 12:20:52,265][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002313_37896192.pth [2024-03-29 12:20:52,539][00501] Updated weights for policy 0, policy_version 2920 (0.0024) [2024-03-29 12:20:55,300][00501] Updated weights for policy 0, policy_version 2930 (0.0023) [2024-03-29 12:20:56,685][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 41432.1). Total num frames: 48070656. Throughput: 0: 42126.7. Samples: 48291860. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 12:20:56,686][00126] Avg episode reward: [(0, '0.013')] [2024-03-29 12:20:59,189][00501] Updated weights for policy 0, policy_version 2940 (0.0023) [2024-03-29 12:21:01,685][00126] Fps is (10 sec: 45875.8, 60 sec: 42052.2, 300 sec: 41487.6). Total num frames: 48267264. Throughput: 0: 41992.4. Samples: 48544700. Policy #0 lag: (min: 0.0, avg: 22.6, max: 41.0) [2024-03-29 12:21:01,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:21:02,946][00501] Updated weights for policy 0, policy_version 2950 (0.0026) [2024-03-29 12:21:05,669][00481] Signal inference workers to stop experience collection... (1850 times) [2024-03-29 12:21:05,743][00501] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-03-29 12:21:05,748][00481] Signal inference workers to resume experience collection... (1850 times) [2024-03-29 12:21:05,771][00501] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-03-29 12:21:06,685][00126] Fps is (10 sec: 34406.5, 60 sec: 41233.1, 300 sec: 41154.4). Total num frames: 48414720. Throughput: 0: 42088.9. Samples: 48824800. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 12:21:06,686][00126] Avg episode reward: [(0, '0.025')] [2024-03-29 12:21:07,996][00501] Updated weights for policy 0, policy_version 2960 (0.0019) [2024-03-29 12:21:10,882][00501] Updated weights for policy 0, policy_version 2970 (0.0028) [2024-03-29 12:21:11,685][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41432.1). Total num frames: 48693248. Throughput: 0: 41971.2. Samples: 48921660. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 12:21:11,686][00126] Avg episode reward: [(0, '0.021')] [2024-03-29 12:21:14,872][00501] Updated weights for policy 0, policy_version 2980 (0.0022) [2024-03-29 12:21:16,685][00126] Fps is (10 sec: 47513.5, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 48889856. Throughput: 0: 41865.7. Samples: 49171140. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 12:21:16,686][00126] Avg episode reward: [(0, '0.016')] [2024-03-29 12:21:18,422][00501] Updated weights for policy 0, policy_version 2990 (0.0024) [2024-03-29 12:21:21,688][00126] Fps is (10 sec: 36036.8, 60 sec: 41777.6, 300 sec: 41209.6). Total num frames: 49053696. Throughput: 0: 41706.9. Samples: 49447520. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:21:21,689][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:21:23,578][00501] Updated weights for policy 0, policy_version 3000 (0.0033) [2024-03-29 12:21:26,535][00501] Updated weights for policy 0, policy_version 3010 (0.0023) [2024-03-29 12:21:26,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41321.0). Total num frames: 49315840. Throughput: 0: 42148.5. Samples: 49567080. Policy #0 lag: (min: 1.0, avg: 17.0, max: 42.0) [2024-03-29 12:21:26,686][00126] Avg episode reward: [(0, '0.023')] [2024-03-29 12:21:30,672][00501] Updated weights for policy 0, policy_version 3020 (0.0028) [2024-03-29 12:21:31,685][00126] Fps is (10 sec: 47524.4, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 49528832. Throughput: 0: 41572.9. Samples: 49792460. Policy #0 lag: (min: 1.0, avg: 22.2, max: 42.0) [2024-03-29 12:21:31,686][00126] Avg episode reward: [(0, '0.025')] [2024-03-29 12:21:33,114][00481] Signal inference workers to stop experience collection... (1900 times) [2024-03-29 12:21:33,136][00501] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-03-29 12:21:33,330][00481] Signal inference workers to resume experience collection... (1900 times) [2024-03-29 12:21:33,331][00501] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-03-29 12:21:34,128][00501] Updated weights for policy 0, policy_version 3030 (0.0017) [2024-03-29 12:21:36,686][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 41265.5). Total num frames: 49692672. Throughput: 0: 41247.6. Samples: 50061780. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 12:21:36,686][00126] Avg episode reward: [(0, '0.031')] [2024-03-29 12:21:39,403][00501] Updated weights for policy 0, policy_version 3040 (0.0022) [2024-03-29 12:21:41,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41233.2, 300 sec: 41265.5). Total num frames: 49922048. Throughput: 0: 42265.0. Samples: 50193780. Policy #0 lag: (min: 1.0, avg: 17.9, max: 43.0) [2024-03-29 12:21:41,686][00126] Avg episode reward: [(0, '0.022')] [2024-03-29 12:21:42,237][00501] Updated weights for policy 0, policy_version 3050 (0.0027) [2024-03-29 12:21:46,323][00501] Updated weights for policy 0, policy_version 3060 (0.0023) [2024-03-29 12:21:46,685][00126] Fps is (10 sec: 45875.9, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 50151424. Throughput: 0: 41574.7. Samples: 50415560. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 12:21:46,686][00126] Avg episode reward: [(0, '0.020')] [2024-03-29 12:21:49,863][00501] Updated weights for policy 0, policy_version 3070 (0.0027) [2024-03-29 12:21:51,685][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.4, 300 sec: 41376.6). Total num frames: 50331648. Throughput: 0: 41217.7. Samples: 50679600. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 12:21:51,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:21:55,039][00501] Updated weights for policy 0, policy_version 3080 (0.0024) [2024-03-29 12:21:56,685][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 50544640. Throughput: 0: 42245.0. Samples: 50822680. Policy #0 lag: (min: 0.0, avg: 18.0, max: 41.0) [2024-03-29 12:21:56,686][00126] Avg episode reward: [(0, '0.019')] [2024-03-29 12:21:57,959][00501] Updated weights for policy 0, policy_version 3090 (0.0025) [2024-03-29 12:22:01,686][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 50774016. Throughput: 0: 41697.2. Samples: 51047520. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 12:22:01,686][00126] Avg episode reward: [(0, '0.020')] [2024-03-29 12:22:01,924][00501] Updated weights for policy 0, policy_version 3100 (0.0017) [2024-03-29 12:22:04,561][00481] Signal inference workers to stop experience collection... (1950 times) [2024-03-29 12:22:04,583][00501] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-03-29 12:22:04,776][00481] Signal inference workers to resume experience collection... (1950 times) [2024-03-29 12:22:04,777][00501] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-03-29 12:22:05,596][00501] Updated weights for policy 0, policy_version 3110 (0.0027) [2024-03-29 12:22:06,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42871.4, 300 sec: 41598.7). Total num frames: 50987008. Throughput: 0: 41123.4. Samples: 51297980. Policy #0 lag: (min: 2.0, avg: 22.2, max: 43.0) [2024-03-29 12:22:06,686][00126] Avg episode reward: [(0, '0.031')] [2024-03-29 12:22:10,627][00501] Updated weights for policy 0, policy_version 3120 (0.0030) [2024-03-29 12:22:11,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41376.5). Total num frames: 51167232. Throughput: 0: 41978.7. Samples: 51456120. Policy #0 lag: (min: 1.0, avg: 16.8, max: 41.0) [2024-03-29 12:22:11,686][00126] Avg episode reward: [(0, '0.022')] [2024-03-29 12:22:13,620][00501] Updated weights for policy 0, policy_version 3130 (0.0019) [2024-03-29 12:22:16,685][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41543.2). Total num frames: 51412992. Throughput: 0: 41665.3. Samples: 51667400. Policy #0 lag: (min: 1.0, avg: 22.7, max: 45.0) [2024-03-29 12:22:16,688][00126] Avg episode reward: [(0, '0.034')] [2024-03-29 12:22:16,689][00481] Saving new best policy, reward=0.034! [2024-03-29 12:22:17,623][00501] Updated weights for policy 0, policy_version 3140 (0.0021) [2024-03-29 12:22:21,320][00501] Updated weights for policy 0, policy_version 3150 (0.0028) [2024-03-29 12:22:21,685][00126] Fps is (10 sec: 45875.2, 60 sec: 42873.1, 300 sec: 41765.3). Total num frames: 51625984. Throughput: 0: 41452.1. Samples: 51927120. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:22:21,686][00126] Avg episode reward: [(0, '0.031')] [2024-03-29 12:22:26,070][00501] Updated weights for policy 0, policy_version 3160 (0.0019) [2024-03-29 12:22:26,685][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 51789824. Throughput: 0: 42023.6. Samples: 52084840. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 12:22:26,686][00126] Avg episode reward: [(0, '0.023')] [2024-03-29 12:22:29,111][00501] Updated weights for policy 0, policy_version 3170 (0.0022) [2024-03-29 12:22:31,686][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 52035584. Throughput: 0: 41978.6. Samples: 52304600. Policy #0 lag: (min: 0.0, avg: 22.3, max: 43.0) [2024-03-29 12:22:31,686][00126] Avg episode reward: [(0, '0.020')] [2024-03-29 12:22:33,218][00501] Updated weights for policy 0, policy_version 3180 (0.0024) [2024-03-29 12:22:36,142][00481] Signal inference workers to stop experience collection... (2000 times) [2024-03-29 12:22:36,142][00481] Signal inference workers to resume experience collection... (2000 times) [2024-03-29 12:22:36,182][00501] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-03-29 12:22:36,182][00501] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-03-29 12:22:36,685][00126] Fps is (10 sec: 45874.5, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 52248576. Throughput: 0: 41934.2. Samples: 52566640. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 12:22:36,686][00126] Avg episode reward: [(0, '0.033')] [2024-03-29 12:22:36,720][00501] Updated weights for policy 0, policy_version 3190 (0.0018) [2024-03-29 12:22:41,494][00501] Updated weights for policy 0, policy_version 3200 (0.0017) [2024-03-29 12:22:41,685][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.1, 300 sec: 41543.1). Total num frames: 52428800. Throughput: 0: 42054.2. Samples: 52715120. Policy #0 lag: (min: 0.0, avg: 18.0, max: 41.0) [2024-03-29 12:22:41,686][00126] Avg episode reward: [(0, '0.027')] [2024-03-29 12:22:44,568][00501] Updated weights for policy 0, policy_version 3210 (0.0025) [2024-03-29 12:22:46,685][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 52674560. Throughput: 0: 41897.4. Samples: 52932900. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 12:22:46,686][00126] Avg episode reward: [(0, '0.021')] [2024-03-29 12:22:48,784][00501] Updated weights for policy 0, policy_version 3220 (0.0032) [2024-03-29 12:22:51,685][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 52871168. Throughput: 0: 42240.1. Samples: 53198780. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:22:51,686][00126] Avg episode reward: [(0, '0.035')] [2024-03-29 12:22:52,224][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003229_52903936.pth... [2024-03-29 12:22:52,586][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002616_42860544.pth [2024-03-29 12:22:52,848][00501] Updated weights for policy 0, policy_version 3230 (0.0025) [2024-03-29 12:22:56,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 53035008. Throughput: 0: 41473.3. Samples: 53322420. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 12:22:56,687][00126] Avg episode reward: [(0, '0.028')] [2024-03-29 12:22:57,362][00501] Updated weights for policy 0, policy_version 3240 (0.0023) [2024-03-29 12:23:00,652][00501] Updated weights for policy 0, policy_version 3250 (0.0025) [2024-03-29 12:23:01,685][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 53280768. Throughput: 0: 41704.5. Samples: 53544100. Policy #0 lag: (min: 0.0, avg: 22.9, max: 46.0) [2024-03-29 12:23:01,686][00126] Avg episode reward: [(0, '0.034')] [2024-03-29 12:23:04,831][00501] Updated weights for policy 0, policy_version 3260 (0.0027) [2024-03-29 12:23:06,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 53477376. Throughput: 0: 41965.8. Samples: 53815580. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 12:23:06,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:23:06,971][00481] Saving new best policy, reward=0.038! [2024-03-29 12:23:07,711][00481] Signal inference workers to stop experience collection... (2050 times) [2024-03-29 12:23:07,733][00501] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-03-29 12:23:07,917][00481] Signal inference workers to resume experience collection... (2050 times) [2024-03-29 12:23:07,918][00501] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-03-29 12:23:08,712][00501] Updated weights for policy 0, policy_version 3270 (0.0026) [2024-03-29 12:23:11,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 41487.6). Total num frames: 53641216. Throughput: 0: 40946.6. Samples: 53927440. Policy #0 lag: (min: 0.0, avg: 20.4, max: 43.0) [2024-03-29 12:23:11,686][00126] Avg episode reward: [(0, '0.042')] [2024-03-29 12:23:11,706][00481] Saving new best policy, reward=0.042! [2024-03-29 12:23:13,707][00501] Updated weights for policy 0, policy_version 3280 (0.0019) [2024-03-29 12:23:16,615][00501] Updated weights for policy 0, policy_version 3290 (0.0022) [2024-03-29 12:23:16,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 53903360. Throughput: 0: 41515.7. Samples: 54172800. Policy #0 lag: (min: 1.0, avg: 19.6, max: 42.0) [2024-03-29 12:23:16,686][00126] Avg episode reward: [(0, '0.047')] [2024-03-29 12:23:16,901][00481] Saving new best policy, reward=0.047! [2024-03-29 12:23:20,859][00501] Updated weights for policy 0, policy_version 3300 (0.0019) [2024-03-29 12:23:21,685][00126] Fps is (10 sec: 45875.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 54099968. Throughput: 0: 41116.9. Samples: 54416900. Policy #0 lag: (min: 0.0, avg: 23.4, max: 42.0) [2024-03-29 12:23:21,686][00126] Avg episode reward: [(0, '0.037')] [2024-03-29 12:23:24,482][00501] Updated weights for policy 0, policy_version 3310 (0.0026) [2024-03-29 12:23:26,686][00126] Fps is (10 sec: 37682.6, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 54280192. Throughput: 0: 40655.9. Samples: 54544640. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 12:23:26,686][00126] Avg episode reward: [(0, '0.042')] [2024-03-29 12:23:29,507][00501] Updated weights for policy 0, policy_version 3320 (0.0019) [2024-03-29 12:23:31,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 54509568. Throughput: 0: 41697.3. Samples: 54809280. Policy #0 lag: (min: 0.0, avg: 16.6, max: 41.0) [2024-03-29 12:23:31,686][00126] Avg episode reward: [(0, '0.039')] [2024-03-29 12:23:32,379][00501] Updated weights for policy 0, policy_version 3330 (0.0023) [2024-03-29 12:23:36,506][00501] Updated weights for policy 0, policy_version 3340 (0.0023) [2024-03-29 12:23:36,685][00126] Fps is (10 sec: 44237.8, 60 sec: 41233.2, 300 sec: 41654.2). Total num frames: 54722560. Throughput: 0: 41108.0. Samples: 55048640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 12:23:36,686][00126] Avg episode reward: [(0, '0.032')] [2024-03-29 12:23:38,271][00481] Signal inference workers to stop experience collection... (2100 times) [2024-03-29 12:23:38,316][00501] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-03-29 12:23:38,468][00481] Signal inference workers to resume experience collection... (2100 times) [2024-03-29 12:23:38,469][00501] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-03-29 12:23:40,373][00501] Updated weights for policy 0, policy_version 3350 (0.0020) [2024-03-29 12:23:41,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 54919168. Throughput: 0: 41152.9. Samples: 55174300. Policy #0 lag: (min: 0.0, avg: 21.1, max: 40.0) [2024-03-29 12:23:41,686][00126] Avg episode reward: [(0, '0.046')] [2024-03-29 12:23:45,088][00501] Updated weights for policy 0, policy_version 3360 (0.0030) [2024-03-29 12:23:46,685][00126] Fps is (10 sec: 39321.1, 60 sec: 40686.9, 300 sec: 41543.2). Total num frames: 55115776. Throughput: 0: 42033.3. Samples: 55435600. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 12:23:46,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:23:48,154][00501] Updated weights for policy 0, policy_version 3370 (0.0019) [2024-03-29 12:23:51,686][00126] Fps is (10 sec: 42597.8, 60 sec: 41232.9, 300 sec: 41654.2). Total num frames: 55345152. Throughput: 0: 41268.3. Samples: 55672660. Policy #0 lag: (min: 1.0, avg: 23.4, max: 41.0) [2024-03-29 12:23:51,686][00126] Avg episode reward: [(0, '0.035')] [2024-03-29 12:23:52,148][00501] Updated weights for policy 0, policy_version 3380 (0.0025) [2024-03-29 12:23:56,065][00501] Updated weights for policy 0, policy_version 3390 (0.0020) [2024-03-29 12:23:56,685][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 55558144. Throughput: 0: 41745.8. Samples: 55806000. Policy #0 lag: (min: 0.0, avg: 20.7, max: 45.0) [2024-03-29 12:23:56,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:24:00,684][00501] Updated weights for policy 0, policy_version 3400 (0.0023) [2024-03-29 12:24:01,685][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 55754752. Throughput: 0: 42193.3. Samples: 56071500. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 12:24:01,686][00126] Avg episode reward: [(0, '0.068')] [2024-03-29 12:24:01,710][00481] Saving new best policy, reward=0.068! [2024-03-29 12:24:03,844][00501] Updated weights for policy 0, policy_version 3410 (0.0031) [2024-03-29 12:24:06,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 55967744. Throughput: 0: 41713.4. Samples: 56294000. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 12:24:06,686][00126] Avg episode reward: [(0, '0.036')] [2024-03-29 12:24:07,859][00501] Updated weights for policy 0, policy_version 3420 (0.0023) [2024-03-29 12:24:10,615][00481] Signal inference workers to stop experience collection... (2150 times) [2024-03-29 12:24:10,616][00481] Signal inference workers to resume experience collection... (2150 times) [2024-03-29 12:24:10,658][00501] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-03-29 12:24:10,658][00501] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-03-29 12:24:11,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 56180736. Throughput: 0: 41953.9. Samples: 56432560. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 12:24:11,686][00126] Avg episode reward: [(0, '0.051')] [2024-03-29 12:24:11,853][00501] Updated weights for policy 0, policy_version 3430 (0.0023) [2024-03-29 12:24:16,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40413.9, 300 sec: 41487.6). Total num frames: 56328192. Throughput: 0: 41743.2. Samples: 56687720. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 12:24:16,686][00126] Avg episode reward: [(0, '0.053')] [2024-03-29 12:24:17,083][00501] Updated weights for policy 0, policy_version 3440 (0.0029) [2024-03-29 12:24:20,073][00501] Updated weights for policy 0, policy_version 3450 (0.0020) [2024-03-29 12:24:21,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 56590336. Throughput: 0: 40918.2. Samples: 56889960. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 12:24:21,686][00126] Avg episode reward: [(0, '0.040')] [2024-03-29 12:24:24,211][00501] Updated weights for policy 0, policy_version 3460 (0.0022) [2024-03-29 12:24:26,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.3, 300 sec: 41765.3). Total num frames: 56770560. Throughput: 0: 41246.7. Samples: 57030400. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 12:24:26,686][00126] Avg episode reward: [(0, '0.040')] [2024-03-29 12:24:27,954][00501] Updated weights for policy 0, policy_version 3470 (0.0029) [2024-03-29 12:24:31,685][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 56967168. Throughput: 0: 41330.7. Samples: 57295480. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 12:24:31,688][00126] Avg episode reward: [(0, '0.040')] [2024-03-29 12:24:32,528][00501] Updated weights for policy 0, policy_version 3480 (0.0031) [2024-03-29 12:24:35,483][00501] Updated weights for policy 0, policy_version 3490 (0.0018) [2024-03-29 12:24:36,685][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 57229312. Throughput: 0: 41239.7. Samples: 57528440. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 12:24:36,686][00126] Avg episode reward: [(0, '0.051')] [2024-03-29 12:24:39,733][00501] Updated weights for policy 0, policy_version 3500 (0.0024) [2024-03-29 12:24:41,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 57409536. Throughput: 0: 41414.7. Samples: 57669660. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 12:24:41,686][00126] Avg episode reward: [(0, '0.048')] [2024-03-29 12:24:43,255][00501] Updated weights for policy 0, policy_version 3510 (0.0023) [2024-03-29 12:24:43,883][00481] Signal inference workers to stop experience collection... (2200 times) [2024-03-29 12:24:43,883][00481] Signal inference workers to resume experience collection... (2200 times) [2024-03-29 12:24:43,916][00501] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-03-29 12:24:43,916][00501] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-03-29 12:24:46,685][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 57606144. Throughput: 0: 41547.1. Samples: 57941120. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:24:46,686][00126] Avg episode reward: [(0, '0.043')] [2024-03-29 12:24:47,900][00501] Updated weights for policy 0, policy_version 3520 (0.0023) [2024-03-29 12:24:50,749][00501] Updated weights for policy 0, policy_version 3530 (0.0027) [2024-03-29 12:24:51,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 57868288. Throughput: 0: 41791.1. Samples: 58174600. Policy #0 lag: (min: 1.0, avg: 18.7, max: 42.0) [2024-03-29 12:24:51,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:24:51,861][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003533_57884672.pth... [2024-03-29 12:24:52,214][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000002919_47824896.pth [2024-03-29 12:24:55,060][00501] Updated weights for policy 0, policy_version 3540 (0.0021) [2024-03-29 12:24:56,686][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 58048512. Throughput: 0: 41587.9. Samples: 58304020. Policy #0 lag: (min: 1.0, avg: 23.5, max: 42.0) [2024-03-29 12:24:56,686][00126] Avg episode reward: [(0, '0.037')] [2024-03-29 12:24:58,946][00501] Updated weights for policy 0, policy_version 3550 (0.0019) [2024-03-29 12:25:01,685][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 58228736. Throughput: 0: 41757.7. Samples: 58566820. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 12:25:01,688][00126] Avg episode reward: [(0, '0.047')] [2024-03-29 12:25:03,340][00501] Updated weights for policy 0, policy_version 3560 (0.0019) [2024-03-29 12:25:06,410][00501] Updated weights for policy 0, policy_version 3570 (0.0035) [2024-03-29 12:25:06,685][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 58490880. Throughput: 0: 42407.6. Samples: 58798300. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 12:25:06,686][00126] Avg episode reward: [(0, '0.056')] [2024-03-29 12:25:10,840][00501] Updated weights for policy 0, policy_version 3580 (0.0023) [2024-03-29 12:25:11,685][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 58687488. Throughput: 0: 42312.3. Samples: 58934460. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 12:25:11,686][00126] Avg episode reward: [(0, '0.053')] [2024-03-29 12:25:14,820][00501] Updated weights for policy 0, policy_version 3590 (0.0025) [2024-03-29 12:25:16,685][00126] Fps is (10 sec: 37683.1, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 58867712. Throughput: 0: 42060.4. Samples: 59188200. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 12:25:16,686][00126] Avg episode reward: [(0, '0.054')] [2024-03-29 12:25:18,635][00481] Signal inference workers to stop experience collection... (2250 times) [2024-03-29 12:25:18,669][00501] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-03-29 12:25:18,850][00481] Signal inference workers to resume experience collection... (2250 times) [2024-03-29 12:25:18,851][00501] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-03-29 12:25:19,107][00501] Updated weights for policy 0, policy_version 3600 (0.0023) [2024-03-29 12:25:21,685][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 59113472. Throughput: 0: 42396.0. Samples: 59436260. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 12:25:21,686][00126] Avg episode reward: [(0, '0.024')] [2024-03-29 12:25:22,210][00501] Updated weights for policy 0, policy_version 3610 (0.0026) [2024-03-29 12:25:26,685][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 59293696. Throughput: 0: 41740.4. Samples: 59547980. Policy #0 lag: (min: 0.0, avg: 23.5, max: 40.0) [2024-03-29 12:25:26,687][00126] Avg episode reward: [(0, '0.059')] [2024-03-29 12:25:26,716][00501] Updated weights for policy 0, policy_version 3620 (0.0018) [2024-03-29 12:25:30,354][00501] Updated weights for policy 0, policy_version 3630 (0.0021) [2024-03-29 12:25:31,686][00126] Fps is (10 sec: 39320.7, 60 sec: 42325.2, 300 sec: 41820.9). Total num frames: 59506688. Throughput: 0: 41455.9. Samples: 59806640. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 12:25:31,686][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:25:34,849][00501] Updated weights for policy 0, policy_version 3640 (0.0021) [2024-03-29 12:25:36,685][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 59736064. Throughput: 0: 42204.0. Samples: 60073780. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 12:25:36,688][00126] Avg episode reward: [(0, '0.046')] [2024-03-29 12:25:37,779][00501] Updated weights for policy 0, policy_version 3650 (0.0027) [2024-03-29 12:25:41,685][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 59932672. Throughput: 0: 41381.0. Samples: 60166160. Policy #0 lag: (min: 2.0, avg: 22.7, max: 43.0) [2024-03-29 12:25:41,686][00126] Avg episode reward: [(0, '0.043')] [2024-03-29 12:25:42,346][00501] Updated weights for policy 0, policy_version 3660 (0.0027) [2024-03-29 12:25:46,055][00501] Updated weights for policy 0, policy_version 3670 (0.0028) [2024-03-29 12:25:46,685][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 60145664. Throughput: 0: 41637.3. Samples: 60440500. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 12:25:46,686][00126] Avg episode reward: [(0, '0.041')] [2024-03-29 12:25:47,956][00481] Signal inference workers to stop experience collection... (2300 times) [2024-03-29 12:25:47,993][00501] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-03-29 12:25:48,143][00481] Signal inference workers to resume experience collection... (2300 times) [2024-03-29 12:25:48,144][00501] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-03-29 12:25:50,637][00501] Updated weights for policy 0, policy_version 3680 (0.0020) [2024-03-29 12:25:51,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 60342272. Throughput: 0: 42132.9. Samples: 60694280. Policy #0 lag: (min: 0.0, avg: 20.2, max: 44.0) [2024-03-29 12:25:51,686][00126] Avg episode reward: [(0, '0.059')] [2024-03-29 12:25:53,667][00501] Updated weights for policy 0, policy_version 3690 (0.0021) [2024-03-29 12:25:56,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 60555264. Throughput: 0: 41306.3. Samples: 60793240. Policy #0 lag: (min: 0.0, avg: 24.2, max: 43.0) [2024-03-29 12:25:56,686][00126] Avg episode reward: [(0, '0.068')] [2024-03-29 12:25:58,107][00501] Updated weights for policy 0, policy_version 3700 (0.0018) [2024-03-29 12:26:01,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 60768256. Throughput: 0: 41705.3. Samples: 61064940. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 12:26:01,686][00126] Avg episode reward: [(0, '0.061')] [2024-03-29 12:26:01,963][00501] Updated weights for policy 0, policy_version 3710 (0.0027) [2024-03-29 12:26:06,162][00501] Updated weights for policy 0, policy_version 3720 (0.0018) [2024-03-29 12:26:06,685][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 60964864. Throughput: 0: 41857.6. Samples: 61319860. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 12:26:06,686][00126] Avg episode reward: [(0, '0.034')] [2024-03-29 12:26:09,350][00501] Updated weights for policy 0, policy_version 3730 (0.0023) [2024-03-29 12:26:11,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 61194240. Throughput: 0: 41802.2. Samples: 61429080. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 12:26:11,686][00126] Avg episode reward: [(0, '0.050')] [2024-03-29 12:26:13,889][00501] Updated weights for policy 0, policy_version 3740 (0.0019) [2024-03-29 12:26:16,685][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41821.2). Total num frames: 61390848. Throughput: 0: 42040.6. Samples: 61698460. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 12:26:16,686][00126] Avg episode reward: [(0, '0.063')] [2024-03-29 12:26:17,519][00501] Updated weights for policy 0, policy_version 3750 (0.0023) [2024-03-29 12:26:21,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41232.9, 300 sec: 41598.7). Total num frames: 61587456. Throughput: 0: 41947.9. Samples: 61961440. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 12:26:21,686][00126] Avg episode reward: [(0, '0.054')] [2024-03-29 12:26:21,835][00501] Updated weights for policy 0, policy_version 3760 (0.0022) [2024-03-29 12:26:21,864][00481] Signal inference workers to stop experience collection... (2350 times) [2024-03-29 12:26:21,865][00481] Signal inference workers to resume experience collection... (2350 times) [2024-03-29 12:26:21,904][00501] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-03-29 12:26:21,904][00501] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-03-29 12:26:24,724][00501] Updated weights for policy 0, policy_version 3770 (0.0030) [2024-03-29 12:26:26,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 61849600. Throughput: 0: 42335.6. Samples: 62071260. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 12:26:26,686][00126] Avg episode reward: [(0, '0.072')] [2024-03-29 12:26:26,687][00481] Saving new best policy, reward=0.072! [2024-03-29 12:26:29,456][00501] Updated weights for policy 0, policy_version 3780 (0.0027) [2024-03-29 12:26:31,685][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 62013440. Throughput: 0: 41860.9. Samples: 62324240. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 12:26:31,686][00126] Avg episode reward: [(0, '0.046')] [2024-03-29 12:26:33,104][00501] Updated weights for policy 0, policy_version 3790 (0.0033) [2024-03-29 12:26:36,685][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 62226432. Throughput: 0: 42251.6. Samples: 62595600. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 12:26:36,686][00126] Avg episode reward: [(0, '0.057')] [2024-03-29 12:26:37,343][00501] Updated weights for policy 0, policy_version 3800 (0.0021) [2024-03-29 12:26:40,229][00501] Updated weights for policy 0, policy_version 3810 (0.0018) [2024-03-29 12:26:41,685][00126] Fps is (10 sec: 49152.4, 60 sec: 42871.5, 300 sec: 41876.4). Total num frames: 62504960. Throughput: 0: 42612.4. Samples: 62710800. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:26:41,686][00126] Avg episode reward: [(0, '0.081')] [2024-03-29 12:26:41,704][00481] Saving new best policy, reward=0.081! [2024-03-29 12:26:45,007][00501] Updated weights for policy 0, policy_version 3820 (0.0025) [2024-03-29 12:26:46,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 62636032. Throughput: 0: 41997.4. Samples: 62954820. Policy #0 lag: (min: 0.0, avg: 23.8, max: 41.0) [2024-03-29 12:26:46,686][00126] Avg episode reward: [(0, '0.061')] [2024-03-29 12:26:48,790][00501] Updated weights for policy 0, policy_version 3830 (0.0023) [2024-03-29 12:26:48,805][00481] Signal inference workers to stop experience collection... (2400 times) [2024-03-29 12:26:48,806][00481] Signal inference workers to resume experience collection... (2400 times) [2024-03-29 12:26:48,839][00501] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-03-29 12:26:48,839][00501] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-03-29 12:26:51,685][00126] Fps is (10 sec: 34406.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 62849024. Throughput: 0: 42195.6. Samples: 63218660. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 12:26:51,686][00126] Avg episode reward: [(0, '0.087')] [2024-03-29 12:26:51,958][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003837_62865408.pth... [2024-03-29 12:26:52,327][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003229_52903936.pth [2024-03-29 12:26:52,347][00481] Saving new best policy, reward=0.087! [2024-03-29 12:26:53,587][00501] Updated weights for policy 0, policy_version 3840 (0.0022) [2024-03-29 12:26:56,553][00501] Updated weights for policy 0, policy_version 3850 (0.0024) [2024-03-29 12:26:56,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 63078400. Throughput: 0: 42081.4. Samples: 63322740. Policy #0 lag: (min: 2.0, avg: 22.7, max: 43.0) [2024-03-29 12:26:56,686][00126] Avg episode reward: [(0, '0.055')] [2024-03-29 12:27:01,155][00501] Updated weights for policy 0, policy_version 3860 (0.0026) [2024-03-29 12:27:01,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 63275008. Throughput: 0: 41350.6. Samples: 63559240. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:27:01,686][00126] Avg episode reward: [(0, '0.077')] [2024-03-29 12:27:04,859][00501] Updated weights for policy 0, policy_version 3870 (0.0020) [2024-03-29 12:27:06,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 63455232. Throughput: 0: 41371.7. Samples: 63823160. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:27:06,686][00126] Avg episode reward: [(0, '0.079')] [2024-03-29 12:27:09,008][00501] Updated weights for policy 0, policy_version 3880 (0.0022) [2024-03-29 12:27:11,685][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 63700992. Throughput: 0: 41789.0. Samples: 63951760. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 12:27:11,688][00126] Avg episode reward: [(0, '0.038')] [2024-03-29 12:27:12,230][00501] Updated weights for policy 0, policy_version 3890 (0.0023) [2024-03-29 12:27:16,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 63881216. Throughput: 0: 41204.1. Samples: 64178420. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:27:16,686][00126] Avg episode reward: [(0, '0.063')] [2024-03-29 12:27:16,908][00501] Updated weights for policy 0, policy_version 3900 (0.0039) [2024-03-29 12:27:20,840][00501] Updated weights for policy 0, policy_version 3910 (0.0025) [2024-03-29 12:27:21,686][00126] Fps is (10 sec: 39320.6, 60 sec: 41779.2, 300 sec: 41709.7). Total num frames: 64094208. Throughput: 0: 41118.9. Samples: 64445960. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 12:27:21,686][00126] Avg episode reward: [(0, '0.051')] [2024-03-29 12:27:22,181][00481] Signal inference workers to stop experience collection... (2450 times) [2024-03-29 12:27:22,255][00481] Signal inference workers to resume experience collection... (2450 times) [2024-03-29 12:27:22,255][00501] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-03-29 12:27:22,283][00501] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-03-29 12:27:24,765][00501] Updated weights for policy 0, policy_version 3920 (0.0022) [2024-03-29 12:27:26,685][00126] Fps is (10 sec: 42598.2, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 64307200. Throughput: 0: 41382.2. Samples: 64573000. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 12:27:26,686][00126] Avg episode reward: [(0, '0.063')] [2024-03-29 12:27:27,904][00501] Updated weights for policy 0, policy_version 3930 (0.0029) [2024-03-29 12:27:31,686][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 64520192. Throughput: 0: 40973.2. Samples: 64798620. Policy #0 lag: (min: 0.0, avg: 23.9, max: 40.0) [2024-03-29 12:27:31,686][00126] Avg episode reward: [(0, '0.060')] [2024-03-29 12:27:32,595][00501] Updated weights for policy 0, policy_version 3940 (0.0031) [2024-03-29 12:27:36,568][00501] Updated weights for policy 0, policy_version 3950 (0.0020) [2024-03-29 12:27:36,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 64716800. Throughput: 0: 41137.4. Samples: 65069840. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 12:27:36,686][00126] Avg episode reward: [(0, '0.061')] [2024-03-29 12:27:40,583][00501] Updated weights for policy 0, policy_version 3960 (0.0027) [2024-03-29 12:27:41,685][00126] Fps is (10 sec: 40960.5, 60 sec: 40413.9, 300 sec: 41543.2). Total num frames: 64929792. Throughput: 0: 41791.1. Samples: 65203340. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 12:27:41,686][00126] Avg episode reward: [(0, '0.079')] [2024-03-29 12:27:43,552][00501] Updated weights for policy 0, policy_version 3970 (0.0017) [2024-03-29 12:27:46,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 65142784. Throughput: 0: 41410.7. Samples: 65422720. Policy #0 lag: (min: 1.0, avg: 24.5, max: 42.0) [2024-03-29 12:27:46,688][00126] Avg episode reward: [(0, '0.069')] [2024-03-29 12:27:48,010][00501] Updated weights for policy 0, policy_version 3980 (0.0019) [2024-03-29 12:27:51,686][00126] Fps is (10 sec: 40959.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 65339392. Throughput: 0: 41749.6. Samples: 65701900. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 12:27:51,686][00126] Avg episode reward: [(0, '0.081')] [2024-03-29 12:27:52,106][00501] Updated weights for policy 0, policy_version 3990 (0.0026) [2024-03-29 12:27:54,908][00481] Signal inference workers to stop experience collection... (2500 times) [2024-03-29 12:27:54,952][00501] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-03-29 12:27:54,987][00481] Signal inference workers to resume experience collection... (2500 times) [2024-03-29 12:27:54,991][00501] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-03-29 12:27:55,909][00501] Updated weights for policy 0, policy_version 4000 (0.0022) [2024-03-29 12:27:56,685][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 65568768. Throughput: 0: 41967.8. Samples: 65840320. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 12:27:56,686][00126] Avg episode reward: [(0, '0.067')] [2024-03-29 12:27:59,065][00501] Updated weights for policy 0, policy_version 4010 (0.0034) [2024-03-29 12:28:01,685][00126] Fps is (10 sec: 45876.0, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 65798144. Throughput: 0: 41782.2. Samples: 66058620. Policy #0 lag: (min: 2.0, avg: 23.1, max: 45.0) [2024-03-29 12:28:01,686][00126] Avg episode reward: [(0, '0.077')] [2024-03-29 12:28:03,257][00501] Updated weights for policy 0, policy_version 4020 (0.0031) [2024-03-29 12:28:06,685][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 65978368. Throughput: 0: 42161.8. Samples: 66343240. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 12:28:06,686][00126] Avg episode reward: [(0, '0.079')] [2024-03-29 12:28:07,454][00501] Updated weights for policy 0, policy_version 4030 (0.0021) [2024-03-29 12:28:11,388][00501] Updated weights for policy 0, policy_version 4040 (0.0023) [2024-03-29 12:28:11,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 66191360. Throughput: 0: 42209.3. Samples: 66472420. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 12:28:11,686][00126] Avg episode reward: [(0, '0.062')] [2024-03-29 12:28:14,276][00501] Updated weights for policy 0, policy_version 4050 (0.0020) [2024-03-29 12:28:16,685][00126] Fps is (10 sec: 47513.9, 60 sec: 42871.4, 300 sec: 41876.4). Total num frames: 66453504. Throughput: 0: 42220.5. Samples: 66698540. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 12:28:16,686][00126] Avg episode reward: [(0, '0.064')] [2024-03-29 12:28:18,561][00501] Updated weights for policy 0, policy_version 4060 (0.0024) [2024-03-29 12:28:21,685][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 66617344. Throughput: 0: 42449.3. Samples: 66980060. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:28:21,686][00126] Avg episode reward: [(0, '0.114')] [2024-03-29 12:28:21,706][00481] Saving new best policy, reward=0.114! [2024-03-29 12:28:23,161][00501] Updated weights for policy 0, policy_version 4070 (0.0022) [2024-03-29 12:28:26,562][00481] Signal inference workers to stop experience collection... (2550 times) [2024-03-29 12:28:26,587][00501] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-03-29 12:28:26,685][00126] Fps is (10 sec: 36045.2, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 66813952. Throughput: 0: 42263.6. Samples: 67105200. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 12:28:26,686][00126] Avg episode reward: [(0, '0.075')] [2024-03-29 12:28:26,772][00481] Signal inference workers to resume experience collection... (2550 times) [2024-03-29 12:28:26,772][00501] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-03-29 12:28:27,070][00501] Updated weights for policy 0, policy_version 4080 (0.0022) [2024-03-29 12:28:30,139][00501] Updated weights for policy 0, policy_version 4090 (0.0021) [2024-03-29 12:28:31,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 67076096. Throughput: 0: 42184.8. Samples: 67321040. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 12:28:31,686][00126] Avg episode reward: [(0, '0.064')] [2024-03-29 12:28:34,514][00501] Updated weights for policy 0, policy_version 4100 (0.0024) [2024-03-29 12:28:36,685][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 67256320. Throughput: 0: 41929.9. Samples: 67588740. Policy #0 lag: (min: 0.0, avg: 24.0, max: 40.0) [2024-03-29 12:28:36,686][00126] Avg episode reward: [(0, '0.117')] [2024-03-29 12:28:36,687][00481] Saving new best policy, reward=0.117! [2024-03-29 12:28:38,750][00501] Updated weights for policy 0, policy_version 4110 (0.0027) [2024-03-29 12:28:41,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 67436544. Throughput: 0: 41861.4. Samples: 67724080. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:28:41,686][00126] Avg episode reward: [(0, '0.082')] [2024-03-29 12:28:42,927][00501] Updated weights for policy 0, policy_version 4120 (0.0023) [2024-03-29 12:28:45,797][00501] Updated weights for policy 0, policy_version 4130 (0.0023) [2024-03-29 12:28:46,685][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.5, 300 sec: 41931.9). Total num frames: 67715072. Throughput: 0: 42295.5. Samples: 67961920. Policy #0 lag: (min: 2.0, avg: 24.8, max: 44.0) [2024-03-29 12:28:46,686][00126] Avg episode reward: [(0, '0.090')] [2024-03-29 12:28:49,927][00501] Updated weights for policy 0, policy_version 4140 (0.0032) [2024-03-29 12:28:51,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 67878912. Throughput: 0: 41724.5. Samples: 68220840. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:28:51,686][00126] Avg episode reward: [(0, '0.103')] [2024-03-29 12:28:51,837][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004144_67895296.pth... [2024-03-29 12:28:52,213][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003533_57884672.pth [2024-03-29 12:28:54,356][00501] Updated weights for policy 0, policy_version 4150 (0.0023) [2024-03-29 12:28:56,685][00126] Fps is (10 sec: 36044.9, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 68075520. Throughput: 0: 41854.3. Samples: 68355860. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 12:28:56,687][00126] Avg episode reward: [(0, '0.102')] [2024-03-29 12:28:58,321][00501] Updated weights for policy 0, policy_version 4160 (0.0019) [2024-03-29 12:29:00,365][00481] Signal inference workers to stop experience collection... (2600 times) [2024-03-29 12:29:00,417][00501] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-03-29 12:29:00,451][00481] Signal inference workers to resume experience collection... (2600 times) [2024-03-29 12:29:00,456][00501] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-03-29 12:29:01,556][00501] Updated weights for policy 0, policy_version 4170 (0.0036) [2024-03-29 12:29:01,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 68321280. Throughput: 0: 42203.1. Samples: 68597680. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 12:29:01,686][00126] Avg episode reward: [(0, '0.117')] [2024-03-29 12:29:05,933][00501] Updated weights for policy 0, policy_version 4180 (0.0025) [2024-03-29 12:29:06,685][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 68501504. Throughput: 0: 41506.1. Samples: 68847840. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 12:29:06,686][00126] Avg episode reward: [(0, '0.073')] [2024-03-29 12:29:09,990][00501] Updated weights for policy 0, policy_version 4190 (0.0021) [2024-03-29 12:29:11,685][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 68698112. Throughput: 0: 41608.8. Samples: 68977600. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 12:29:11,686][00126] Avg episode reward: [(0, '0.084')] [2024-03-29 12:29:14,185][00501] Updated weights for policy 0, policy_version 4200 (0.0027) [2024-03-29 12:29:16,685][00126] Fps is (10 sec: 45875.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 68960256. Throughput: 0: 42438.8. Samples: 69230780. Policy #0 lag: (min: 2.0, avg: 20.9, max: 41.0) [2024-03-29 12:29:16,686][00126] Avg episode reward: [(0, '0.118')] [2024-03-29 12:29:17,013][00501] Updated weights for policy 0, policy_version 4210 (0.0030) [2024-03-29 12:29:21,452][00501] Updated weights for policy 0, policy_version 4220 (0.0029) [2024-03-29 12:29:21,686][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 69140480. Throughput: 0: 41949.7. Samples: 69476480. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:29:21,686][00126] Avg episode reward: [(0, '0.101')] [2024-03-29 12:29:25,515][00501] Updated weights for policy 0, policy_version 4230 (0.0022) [2024-03-29 12:29:26,686][00126] Fps is (10 sec: 37682.6, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 69337088. Throughput: 0: 42042.1. Samples: 69615980. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 12:29:26,686][00126] Avg episode reward: [(0, '0.116')] [2024-03-29 12:29:29,684][00501] Updated weights for policy 0, policy_version 4240 (0.0027) [2024-03-29 12:29:31,503][00481] Signal inference workers to stop experience collection... (2650 times) [2024-03-29 12:29:31,534][00501] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-03-29 12:29:31,685][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 69566464. Throughput: 0: 42355.6. Samples: 69867920. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 12:29:31,686][00126] Avg episode reward: [(0, '0.100')] [2024-03-29 12:29:31,715][00481] Signal inference workers to resume experience collection... (2650 times) [2024-03-29 12:29:31,716][00501] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-03-29 12:29:32,575][00501] Updated weights for policy 0, policy_version 4250 (0.0021) [2024-03-29 12:29:36,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 69779456. Throughput: 0: 41916.5. Samples: 70107080. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 12:29:36,686][00126] Avg episode reward: [(0, '0.112')] [2024-03-29 12:29:36,928][00501] Updated weights for policy 0, policy_version 4260 (0.0023) [2024-03-29 12:29:41,023][00501] Updated weights for policy 0, policy_version 4270 (0.0017) [2024-03-29 12:29:41,685][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 69976064. Throughput: 0: 41864.9. Samples: 70239780. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 12:29:41,686][00126] Avg episode reward: [(0, '0.096')] [2024-03-29 12:29:45,262][00501] Updated weights for policy 0, policy_version 4280 (0.0020) [2024-03-29 12:29:46,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 70205440. Throughput: 0: 42489.8. Samples: 70509720. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 12:29:46,686][00126] Avg episode reward: [(0, '0.103')] [2024-03-29 12:29:48,073][00501] Updated weights for policy 0, policy_version 4290 (0.0027) [2024-03-29 12:29:51,685][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 70434816. Throughput: 0: 42155.2. Samples: 70744820. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 12:29:51,686][00126] Avg episode reward: [(0, '0.098')] [2024-03-29 12:29:52,214][00501] Updated weights for policy 0, policy_version 4300 (0.0026) [2024-03-29 12:29:56,386][00501] Updated weights for policy 0, policy_version 4310 (0.0028) [2024-03-29 12:29:56,685][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 70615040. Throughput: 0: 42224.9. Samples: 70877720. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 12:29:56,686][00126] Avg episode reward: [(0, '0.083')] [2024-03-29 12:30:00,657][00501] Updated weights for policy 0, policy_version 4320 (0.0021) [2024-03-29 12:30:01,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 70828032. Throughput: 0: 42448.0. Samples: 71140940. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:30:01,686][00126] Avg episode reward: [(0, '0.099')] [2024-03-29 12:30:03,506][00501] Updated weights for policy 0, policy_version 4330 (0.0023) [2024-03-29 12:30:03,995][00481] Signal inference workers to stop experience collection... (2700 times) [2024-03-29 12:30:04,058][00501] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-03-29 12:30:04,070][00481] Signal inference workers to resume experience collection... (2700 times) [2024-03-29 12:30:04,089][00501] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-03-29 12:30:06,685][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 71057408. Throughput: 0: 42116.9. Samples: 71371740. Policy #0 lag: (min: 2.0, avg: 20.1, max: 42.0) [2024-03-29 12:30:06,686][00126] Avg episode reward: [(0, '0.106')] [2024-03-29 12:30:08,037][00501] Updated weights for policy 0, policy_version 4340 (0.0018) [2024-03-29 12:30:11,686][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 71237632. Throughput: 0: 41871.1. Samples: 71500180. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 12:30:11,686][00126] Avg episode reward: [(0, '0.103')] [2024-03-29 12:30:12,203][00501] Updated weights for policy 0, policy_version 4350 (0.0024) [2024-03-29 12:30:16,252][00501] Updated weights for policy 0, policy_version 4360 (0.0026) [2024-03-29 12:30:16,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 71450624. Throughput: 0: 42428.4. Samples: 71777200. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:30:16,686][00126] Avg episode reward: [(0, '0.140')] [2024-03-29 12:30:16,845][00481] Saving new best policy, reward=0.140! [2024-03-29 12:30:19,307][00501] Updated weights for policy 0, policy_version 4370 (0.0029) [2024-03-29 12:30:21,685][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 71696384. Throughput: 0: 41768.0. Samples: 71986640. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 12:30:21,686][00126] Avg episode reward: [(0, '0.136')] [2024-03-29 12:30:23,832][00501] Updated weights for policy 0, policy_version 4380 (0.0018) [2024-03-29 12:30:26,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 71876608. Throughput: 0: 42157.3. Samples: 72136860. Policy #0 lag: (min: 0.0, avg: 24.0, max: 41.0) [2024-03-29 12:30:26,688][00126] Avg episode reward: [(0, '0.155')] [2024-03-29 12:30:26,689][00481] Saving new best policy, reward=0.155! [2024-03-29 12:30:28,000][00501] Updated weights for policy 0, policy_version 4390 (0.0022) [2024-03-29 12:30:31,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 72056832. Throughput: 0: 41824.0. Samples: 72391800. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 12:30:31,686][00126] Avg episode reward: [(0, '0.121')] [2024-03-29 12:30:32,223][00501] Updated weights for policy 0, policy_version 4400 (0.0028) [2024-03-29 12:30:35,013][00501] Updated weights for policy 0, policy_version 4410 (0.0028) [2024-03-29 12:30:36,685][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 72335360. Throughput: 0: 41436.0. Samples: 72609440. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 12:30:36,686][00126] Avg episode reward: [(0, '0.129')] [2024-03-29 12:30:39,534][00501] Updated weights for policy 0, policy_version 4420 (0.0017) [2024-03-29 12:30:39,933][00481] Signal inference workers to stop experience collection... (2750 times) [2024-03-29 12:30:39,983][00501] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-03-29 12:30:40,019][00481] Signal inference workers to resume experience collection... (2750 times) [2024-03-29 12:30:40,023][00501] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-03-29 12:30:41,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 72515584. Throughput: 0: 41809.8. Samples: 72759160. Policy #0 lag: (min: 0.0, avg: 23.6, max: 41.0) [2024-03-29 12:30:41,686][00126] Avg episode reward: [(0, '0.139')] [2024-03-29 12:30:43,640][00501] Updated weights for policy 0, policy_version 4430 (0.0025) [2024-03-29 12:30:46,685][00126] Fps is (10 sec: 34406.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 72679424. Throughput: 0: 41830.7. Samples: 73023320. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 12:30:46,686][00126] Avg episode reward: [(0, '0.118')] [2024-03-29 12:30:47,792][00501] Updated weights for policy 0, policy_version 4440 (0.0023) [2024-03-29 12:30:50,718][00501] Updated weights for policy 0, policy_version 4450 (0.0020) [2024-03-29 12:30:51,685][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 72957952. Throughput: 0: 41644.9. Samples: 73245760. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:30:51,686][00126] Avg episode reward: [(0, '0.157')] [2024-03-29 12:30:51,837][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004454_72974336.pth... [2024-03-29 12:30:52,186][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000003837_62865408.pth [2024-03-29 12:30:52,207][00481] Saving new best policy, reward=0.157! [2024-03-29 12:30:55,096][00501] Updated weights for policy 0, policy_version 4460 (0.0018) [2024-03-29 12:30:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 73138176. Throughput: 0: 41656.5. Samples: 73374720. Policy #0 lag: (min: 1.0, avg: 24.5, max: 41.0) [2024-03-29 12:30:56,686][00126] Avg episode reward: [(0, '0.148')] [2024-03-29 12:30:59,235][00501] Updated weights for policy 0, policy_version 4470 (0.0019) [2024-03-29 12:31:01,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 73318400. Throughput: 0: 41408.4. Samples: 73640580. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:31:01,688][00126] Avg episode reward: [(0, '0.142')] [2024-03-29 12:31:03,501][00501] Updated weights for policy 0, policy_version 4480 (0.0019) [2024-03-29 12:31:06,506][00501] Updated weights for policy 0, policy_version 4490 (0.0029) [2024-03-29 12:31:06,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 73564160. Throughput: 0: 41938.6. Samples: 73873880. Policy #0 lag: (min: 1.0, avg: 17.5, max: 41.0) [2024-03-29 12:31:06,686][00126] Avg episode reward: [(0, '0.101')] [2024-03-29 12:31:10,895][00501] Updated weights for policy 0, policy_version 4500 (0.0023) [2024-03-29 12:31:11,199][00481] Signal inference workers to stop experience collection... (2800 times) [2024-03-29 12:31:11,199][00481] Signal inference workers to resume experience collection... (2800 times) [2024-03-29 12:31:11,240][00501] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-03-29 12:31:11,240][00501] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-03-29 12:31:11,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 73760768. Throughput: 0: 41323.1. Samples: 73996400. Policy #0 lag: (min: 1.0, avg: 24.2, max: 42.0) [2024-03-29 12:31:11,686][00126] Avg episode reward: [(0, '0.121')] [2024-03-29 12:31:14,778][00501] Updated weights for policy 0, policy_version 4510 (0.0025) [2024-03-29 12:31:16,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 73940992. Throughput: 0: 41974.3. Samples: 74280640. Policy #0 lag: (min: 0.0, avg: 21.0, max: 40.0) [2024-03-29 12:31:16,686][00126] Avg episode reward: [(0, '0.138')] [2024-03-29 12:31:18,947][00501] Updated weights for policy 0, policy_version 4520 (0.0019) [2024-03-29 12:31:21,685][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 74186752. Throughput: 0: 42330.8. Samples: 74514320. Policy #0 lag: (min: 0.0, avg: 18.0, max: 41.0) [2024-03-29 12:31:21,686][00126] Avg episode reward: [(0, '0.142')] [2024-03-29 12:31:22,056][00501] Updated weights for policy 0, policy_version 4530 (0.0030) [2024-03-29 12:31:26,544][00501] Updated weights for policy 0, policy_version 4540 (0.0024) [2024-03-29 12:31:26,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 74383360. Throughput: 0: 41491.6. Samples: 74626280. Policy #0 lag: (min: 2.0, avg: 23.4, max: 41.0) [2024-03-29 12:31:26,686][00126] Avg episode reward: [(0, '0.101')] [2024-03-29 12:31:30,521][00501] Updated weights for policy 0, policy_version 4550 (0.0017) [2024-03-29 12:31:31,685][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 74596352. Throughput: 0: 41833.8. Samples: 74905840. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 12:31:31,686][00126] Avg episode reward: [(0, '0.145')] [2024-03-29 12:31:34,847][00501] Updated weights for policy 0, policy_version 4560 (0.0028) [2024-03-29 12:31:36,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 74809344. Throughput: 0: 42046.3. Samples: 75137840. Policy #0 lag: (min: 1.0, avg: 17.7, max: 41.0) [2024-03-29 12:31:36,686][00126] Avg episode reward: [(0, '0.206')] [2024-03-29 12:31:36,779][00481] Saving new best policy, reward=0.206! [2024-03-29 12:31:37,985][00501] Updated weights for policy 0, policy_version 4570 (0.0026) [2024-03-29 12:31:41,685][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 75022336. Throughput: 0: 41434.3. Samples: 75239260. Policy #0 lag: (min: 2.0, avg: 22.0, max: 41.0) [2024-03-29 12:31:41,686][00126] Avg episode reward: [(0, '0.129')] [2024-03-29 12:31:42,237][00501] Updated weights for policy 0, policy_version 4580 (0.0024) [2024-03-29 12:31:46,246][00501] Updated weights for policy 0, policy_version 4590 (0.0027) [2024-03-29 12:31:46,685][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 75202560. Throughput: 0: 41781.0. Samples: 75520720. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:31:46,686][00126] Avg episode reward: [(0, '0.161')] [2024-03-29 12:31:50,021][00481] Signal inference workers to stop experience collection... (2850 times) [2024-03-29 12:31:50,063][00501] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-03-29 12:31:50,101][00481] Signal inference workers to resume experience collection... (2850 times) [2024-03-29 12:31:50,109][00501] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-03-29 12:31:50,608][00501] Updated weights for policy 0, policy_version 4600 (0.0024) [2024-03-29 12:31:51,685][00126] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 41820.8). Total num frames: 75415552. Throughput: 0: 42035.6. Samples: 75765480. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 12:31:51,686][00126] Avg episode reward: [(0, '0.111')] [2024-03-29 12:31:53,730][00501] Updated weights for policy 0, policy_version 4610 (0.0025) [2024-03-29 12:31:56,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 75661312. Throughput: 0: 41610.3. Samples: 75868860. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:31:56,686][00126] Avg episode reward: [(0, '0.149')] [2024-03-29 12:31:57,856][00501] Updated weights for policy 0, policy_version 4620 (0.0020) [2024-03-29 12:32:01,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 75841536. Throughput: 0: 41107.9. Samples: 76130500. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 12:32:01,687][00126] Avg episode reward: [(0, '0.124')] [2024-03-29 12:32:02,005][00501] Updated weights for policy 0, policy_version 4630 (0.0020) [2024-03-29 12:32:06,256][00501] Updated weights for policy 0, policy_version 4640 (0.0024) [2024-03-29 12:32:06,685][00126] Fps is (10 sec: 37683.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 76038144. Throughput: 0: 41933.3. Samples: 76401320. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 12:32:06,686][00126] Avg episode reward: [(0, '0.136')] [2024-03-29 12:32:09,228][00501] Updated weights for policy 0, policy_version 4650 (0.0026) [2024-03-29 12:32:11,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 76300288. Throughput: 0: 41958.5. Samples: 76514420. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:32:11,686][00126] Avg episode reward: [(0, '0.133')] [2024-03-29 12:32:13,359][00501] Updated weights for policy 0, policy_version 4660 (0.0022) [2024-03-29 12:32:16,685][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 76480512. Throughput: 0: 41485.8. Samples: 76772700. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 12:32:16,687][00126] Avg episode reward: [(0, '0.142')] [2024-03-29 12:32:17,519][00501] Updated weights for policy 0, policy_version 4670 (0.0019) [2024-03-29 12:32:21,685][00126] Fps is (10 sec: 36045.4, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 76660736. Throughput: 0: 42455.1. Samples: 77048320. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 12:32:21,686][00126] Avg episode reward: [(0, '0.167')] [2024-03-29 12:32:21,693][00501] Updated weights for policy 0, policy_version 4680 (0.0020) [2024-03-29 12:32:24,767][00501] Updated weights for policy 0, policy_version 4690 (0.0024) [2024-03-29 12:32:25,577][00481] Signal inference workers to stop experience collection... (2900 times) [2024-03-29 12:32:25,657][00481] Signal inference workers to resume experience collection... (2900 times) [2024-03-29 12:32:25,661][00501] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-03-29 12:32:25,687][00501] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-03-29 12:32:26,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 76922880. Throughput: 0: 42400.5. Samples: 77147280. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 12:32:26,686][00126] Avg episode reward: [(0, '0.159')] [2024-03-29 12:32:29,004][00501] Updated weights for policy 0, policy_version 4700 (0.0023) [2024-03-29 12:32:31,685][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 77103104. Throughput: 0: 41708.9. Samples: 77397620. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 12:32:31,686][00126] Avg episode reward: [(0, '0.201')] [2024-03-29 12:32:33,229][00501] Updated weights for policy 0, policy_version 4710 (0.0019) [2024-03-29 12:32:36,685][00126] Fps is (10 sec: 36044.4, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 77283328. Throughput: 0: 42399.9. Samples: 77673480. Policy #0 lag: (min: 0.0, avg: 18.8, max: 41.0) [2024-03-29 12:32:36,686][00126] Avg episode reward: [(0, '0.196')] [2024-03-29 12:32:37,528][00501] Updated weights for policy 0, policy_version 4720 (0.0023) [2024-03-29 12:32:40,572][00501] Updated weights for policy 0, policy_version 4730 (0.0035) [2024-03-29 12:32:41,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 77561856. Throughput: 0: 42300.8. Samples: 77772400. Policy #0 lag: (min: 1.0, avg: 21.4, max: 44.0) [2024-03-29 12:32:41,686][00126] Avg episode reward: [(0, '0.158')] [2024-03-29 12:32:44,633][00501] Updated weights for policy 0, policy_version 4740 (0.0023) [2024-03-29 12:32:46,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 77742080. Throughput: 0: 42136.5. Samples: 78026640. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:32:46,687][00126] Avg episode reward: [(0, '0.161')] [2024-03-29 12:32:48,782][00501] Updated weights for policy 0, policy_version 4750 (0.0019) [2024-03-29 12:32:51,685][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 77922304. Throughput: 0: 42047.5. Samples: 78293460. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:32:51,686][00126] Avg episode reward: [(0, '0.214')] [2024-03-29 12:32:51,704][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004756_77922304.pth... [2024-03-29 12:32:52,113][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004144_67895296.pth [2024-03-29 12:32:52,132][00481] Saving new best policy, reward=0.214! [2024-03-29 12:32:53,478][00501] Updated weights for policy 0, policy_version 4760 (0.0032) [2024-03-29 12:32:56,538][00501] Updated weights for policy 0, policy_version 4770 (0.0029) [2024-03-29 12:32:56,685][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 78151680. Throughput: 0: 41709.5. Samples: 78391340. Policy #0 lag: (min: 0.0, avg: 21.2, max: 48.0) [2024-03-29 12:32:56,686][00126] Avg episode reward: [(0, '0.197')] [2024-03-29 12:33:00,027][00481] Signal inference workers to stop experience collection... (2950 times) [2024-03-29 12:33:00,029][00481] Signal inference workers to resume experience collection... (2950 times) [2024-03-29 12:33:00,060][00501] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-03-29 12:33:00,060][00501] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-03-29 12:33:00,333][00501] Updated weights for policy 0, policy_version 4780 (0.0030) [2024-03-29 12:33:01,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 78364672. Throughput: 0: 41378.7. Samples: 78634740. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 12:33:01,686][00126] Avg episode reward: [(0, '0.198')] [2024-03-29 12:33:04,623][00501] Updated weights for policy 0, policy_version 4790 (0.0023) [2024-03-29 12:33:06,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 78544896. Throughput: 0: 41251.5. Samples: 78904640. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 12:33:06,686][00126] Avg episode reward: [(0, '0.205')] [2024-03-29 12:33:09,040][00501] Updated weights for policy 0, policy_version 4800 (0.0024) [2024-03-29 12:33:11,686][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 78790656. Throughput: 0: 41778.1. Samples: 79027300. Policy #0 lag: (min: 2.0, avg: 21.4, max: 41.0) [2024-03-29 12:33:11,688][00126] Avg episode reward: [(0, '0.158')] [2024-03-29 12:33:12,267][00501] Updated weights for policy 0, policy_version 4810 (0.0021) [2024-03-29 12:33:15,978][00501] Updated weights for policy 0, policy_version 4820 (0.0017) [2024-03-29 12:33:16,685][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 78987264. Throughput: 0: 41623.5. Samples: 79270680. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:33:16,686][00126] Avg episode reward: [(0, '0.183')] [2024-03-29 12:33:20,116][00501] Updated weights for policy 0, policy_version 4830 (0.0023) [2024-03-29 12:33:21,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 79167488. Throughput: 0: 41584.9. Samples: 79544800. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:33:21,686][00126] Avg episode reward: [(0, '0.191')] [2024-03-29 12:33:24,481][00501] Updated weights for policy 0, policy_version 4840 (0.0022) [2024-03-29 12:33:26,685][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 79429632. Throughput: 0: 42320.9. Samples: 79676840. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 12:33:26,687][00126] Avg episode reward: [(0, '0.202')] [2024-03-29 12:33:27,517][00501] Updated weights for policy 0, policy_version 4850 (0.0025) [2024-03-29 12:33:31,649][00501] Updated weights for policy 0, policy_version 4860 (0.0021) [2024-03-29 12:33:31,685][00126] Fps is (10 sec: 45875.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 79626240. Throughput: 0: 41736.5. Samples: 79904780. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 12:33:31,686][00126] Avg episode reward: [(0, '0.193')] [2024-03-29 12:33:35,902][00501] Updated weights for policy 0, policy_version 4870 (0.0026) [2024-03-29 12:33:36,685][00126] Fps is (10 sec: 36044.9, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 79790080. Throughput: 0: 41812.4. Samples: 80175020. Policy #0 lag: (min: 2.0, avg: 21.1, max: 41.0) [2024-03-29 12:33:36,686][00126] Avg episode reward: [(0, '0.169')] [2024-03-29 12:33:39,313][00481] Signal inference workers to stop experience collection... (3000 times) [2024-03-29 12:33:39,346][00501] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-03-29 12:33:39,494][00481] Signal inference workers to resume experience collection... (3000 times) [2024-03-29 12:33:39,495][00501] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-03-29 12:33:40,282][00501] Updated weights for policy 0, policy_version 4880 (0.0024) [2024-03-29 12:33:41,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 80035840. Throughput: 0: 42618.5. Samples: 80309180. Policy #0 lag: (min: 1.0, avg: 18.0, max: 42.0) [2024-03-29 12:33:41,686][00126] Avg episode reward: [(0, '0.214')] [2024-03-29 12:33:43,313][00501] Updated weights for policy 0, policy_version 4890 (0.0025) [2024-03-29 12:33:46,686][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 80248832. Throughput: 0: 41863.4. Samples: 80518600. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 12:33:46,690][00126] Avg episode reward: [(0, '0.148')] [2024-03-29 12:33:47,397][00501] Updated weights for policy 0, policy_version 4900 (0.0027) [2024-03-29 12:33:51,470][00501] Updated weights for policy 0, policy_version 4910 (0.0020) [2024-03-29 12:33:51,686][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 80445440. Throughput: 0: 41863.8. Samples: 80788520. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:33:51,686][00126] Avg episode reward: [(0, '0.172')] [2024-03-29 12:33:55,967][00501] Updated weights for policy 0, policy_version 4920 (0.0028) [2024-03-29 12:33:56,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 80642048. Throughput: 0: 42214.8. Samples: 80926960. Policy #0 lag: (min: 2.0, avg: 18.3, max: 41.0) [2024-03-29 12:33:56,686][00126] Avg episode reward: [(0, '0.188')] [2024-03-29 12:33:59,071][00501] Updated weights for policy 0, policy_version 4930 (0.0025) [2024-03-29 12:34:01,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 80887808. Throughput: 0: 41729.9. Samples: 81148520. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:34:01,686][00126] Avg episode reward: [(0, '0.151')] [2024-03-29 12:34:02,918][00501] Updated weights for policy 0, policy_version 4940 (0.0019) [2024-03-29 12:34:06,685][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 81084416. Throughput: 0: 41610.3. Samples: 81417260. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 12:34:06,686][00126] Avg episode reward: [(0, '0.150')] [2024-03-29 12:34:06,909][00501] Updated weights for policy 0, policy_version 4950 (0.0017) [2024-03-29 12:34:11,188][00481] Signal inference workers to stop experience collection... (3050 times) [2024-03-29 12:34:11,219][00501] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-03-29 12:34:11,387][00481] Signal inference workers to resume experience collection... (3050 times) [2024-03-29 12:34:11,387][00501] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-03-29 12:34:11,391][00501] Updated weights for policy 0, policy_version 4960 (0.0019) [2024-03-29 12:34:11,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 81281024. Throughput: 0: 41716.0. Samples: 81554060. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 12:34:11,686][00126] Avg episode reward: [(0, '0.217')] [2024-03-29 12:34:11,903][00481] Saving new best policy, reward=0.217! [2024-03-29 12:34:14,828][00501] Updated weights for policy 0, policy_version 4970 (0.0029) [2024-03-29 12:34:16,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 81510400. Throughput: 0: 41791.6. Samples: 81785400. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:34:16,688][00126] Avg episode reward: [(0, '0.179')] [2024-03-29 12:34:18,593][00501] Updated weights for policy 0, policy_version 4980 (0.0029) [2024-03-29 12:34:21,686][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 81690624. Throughput: 0: 41656.9. Samples: 82049580. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 12:34:21,686][00126] Avg episode reward: [(0, '0.207')] [2024-03-29 12:34:22,857][00501] Updated weights for policy 0, policy_version 4990 (0.0033) [2024-03-29 12:34:26,685][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.1, 300 sec: 41765.3). Total num frames: 81887232. Throughput: 0: 41712.5. Samples: 82186240. Policy #0 lag: (min: 1.0, avg: 18.6, max: 42.0) [2024-03-29 12:34:26,686][00126] Avg episode reward: [(0, '0.179')] [2024-03-29 12:34:27,020][00501] Updated weights for policy 0, policy_version 5000 (0.0019) [2024-03-29 12:34:30,239][00501] Updated weights for policy 0, policy_version 5010 (0.0018) [2024-03-29 12:34:31,686][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 82149376. Throughput: 0: 42157.3. Samples: 82415680. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 12:34:31,686][00126] Avg episode reward: [(0, '0.203')] [2024-03-29 12:34:34,137][00501] Updated weights for policy 0, policy_version 5020 (0.0019) [2024-03-29 12:34:36,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 41931.9). Total num frames: 82345984. Throughput: 0: 41725.0. Samples: 82666140. Policy #0 lag: (min: 0.0, avg: 23.0, max: 42.0) [2024-03-29 12:34:36,686][00126] Avg episode reward: [(0, '0.198')] [2024-03-29 12:34:38,496][00501] Updated weights for policy 0, policy_version 5030 (0.0021) [2024-03-29 12:34:41,685][00126] Fps is (10 sec: 36045.3, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 82509824. Throughput: 0: 41972.9. Samples: 82815740. Policy #0 lag: (min: 1.0, avg: 18.4, max: 42.0) [2024-03-29 12:34:41,686][00126] Avg episode reward: [(0, '0.219')] [2024-03-29 12:34:41,914][00481] Signal inference workers to stop experience collection... (3100 times) [2024-03-29 12:34:41,952][00501] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-03-29 12:34:42,096][00481] Signal inference workers to resume experience collection... (3100 times) [2024-03-29 12:34:42,096][00501] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-03-29 12:34:42,096][00481] Saving new best policy, reward=0.219! [2024-03-29 12:34:42,943][00501] Updated weights for policy 0, policy_version 5040 (0.0023) [2024-03-29 12:34:45,969][00501] Updated weights for policy 0, policy_version 5050 (0.0026) [2024-03-29 12:34:46,685][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 82771968. Throughput: 0: 42082.3. Samples: 83042220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 12:34:46,686][00126] Avg episode reward: [(0, '0.204')] [2024-03-29 12:34:49,711][00501] Updated weights for policy 0, policy_version 5060 (0.0028) [2024-03-29 12:34:51,686][00126] Fps is (10 sec: 47513.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 82984960. Throughput: 0: 41504.3. Samples: 83284960. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:34:51,686][00126] Avg episode reward: [(0, '0.187')] [2024-03-29 12:34:51,709][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005065_82984960.pth... [2024-03-29 12:34:52,045][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004454_72974336.pth [2024-03-29 12:34:54,111][00501] Updated weights for policy 0, policy_version 5070 (0.0029) [2024-03-29 12:34:56,685][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 83148800. Throughput: 0: 41734.8. Samples: 83432120. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:34:56,686][00126] Avg episode reward: [(0, '0.298')] [2024-03-29 12:34:56,848][00481] Saving new best policy, reward=0.298! [2024-03-29 12:34:58,490][00501] Updated weights for policy 0, policy_version 5080 (0.0025) [2024-03-29 12:35:01,685][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 83378176. Throughput: 0: 41896.0. Samples: 83670720. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 12:35:01,687][00126] Avg episode reward: [(0, '0.257')] [2024-03-29 12:35:01,696][00501] Updated weights for policy 0, policy_version 5090 (0.0017) [2024-03-29 12:35:05,372][00501] Updated weights for policy 0, policy_version 5100 (0.0022) [2024-03-29 12:35:06,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 83607552. Throughput: 0: 41479.6. Samples: 83916160. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:35:06,686][00126] Avg episode reward: [(0, '0.180')] [2024-03-29 12:35:09,631][00501] Updated weights for policy 0, policy_version 5110 (0.0023) [2024-03-29 12:35:11,685][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 83787776. Throughput: 0: 41662.2. Samples: 84061040. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 12:35:11,686][00126] Avg episode reward: [(0, '0.231')] [2024-03-29 12:35:13,708][00481] Signal inference workers to stop experience collection... (3150 times) [2024-03-29 12:35:13,754][00501] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-03-29 12:35:13,787][00481] Signal inference workers to resume experience collection... (3150 times) [2024-03-29 12:35:13,790][00501] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-03-29 12:35:13,797][00501] Updated weights for policy 0, policy_version 5120 (0.0024) [2024-03-29 12:35:16,686][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 84017152. Throughput: 0: 42049.3. Samples: 84307900. Policy #0 lag: (min: 1.0, avg: 17.6, max: 41.0) [2024-03-29 12:35:16,686][00126] Avg episode reward: [(0, '0.200')] [2024-03-29 12:35:17,088][00501] Updated weights for policy 0, policy_version 5130 (0.0023) [2024-03-29 12:35:20,970][00501] Updated weights for policy 0, policy_version 5140 (0.0019) [2024-03-29 12:35:21,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 84246528. Throughput: 0: 41965.7. Samples: 84554600. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 12:35:21,686][00126] Avg episode reward: [(0, '0.171')] [2024-03-29 12:35:25,324][00501] Updated weights for policy 0, policy_version 5150 (0.0026) [2024-03-29 12:35:26,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 84393984. Throughput: 0: 41622.2. Samples: 84688740. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 12:35:26,686][00126] Avg episode reward: [(0, '0.210')] [2024-03-29 12:35:29,548][00501] Updated weights for policy 0, policy_version 5160 (0.0028) [2024-03-29 12:35:31,686][00126] Fps is (10 sec: 40958.8, 60 sec: 41779.0, 300 sec: 41765.3). Total num frames: 84656128. Throughput: 0: 42036.9. Samples: 84933900. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 12:35:31,687][00126] Avg episode reward: [(0, '0.141')] [2024-03-29 12:35:32,913][00501] Updated weights for policy 0, policy_version 5170 (0.0023) [2024-03-29 12:35:36,685][00126] Fps is (10 sec: 45874.9, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 84852736. Throughput: 0: 42036.5. Samples: 85176600. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 12:35:36,686][00126] Avg episode reward: [(0, '0.232')] [2024-03-29 12:35:36,776][00501] Updated weights for policy 0, policy_version 5180 (0.0023) [2024-03-29 12:35:41,146][00501] Updated weights for policy 0, policy_version 5190 (0.0018) [2024-03-29 12:35:41,685][00126] Fps is (10 sec: 37684.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 85032960. Throughput: 0: 41842.2. Samples: 85315020. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:35:41,686][00126] Avg episode reward: [(0, '0.193')] [2024-03-29 12:35:45,314][00501] Updated weights for policy 0, policy_version 5200 (0.0037) [2024-03-29 12:35:46,137][00481] Signal inference workers to stop experience collection... (3200 times) [2024-03-29 12:35:46,180][00501] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-03-29 12:35:46,355][00481] Signal inference workers to resume experience collection... (3200 times) [2024-03-29 12:35:46,355][00501] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-03-29 12:35:46,685][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 85278720. Throughput: 0: 42254.7. Samples: 85572180. Policy #0 lag: (min: 1.0, avg: 18.6, max: 42.0) [2024-03-29 12:35:46,686][00126] Avg episode reward: [(0, '0.234')] [2024-03-29 12:35:48,595][00501] Updated weights for policy 0, policy_version 5210 (0.0031) [2024-03-29 12:35:51,685][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 85508096. Throughput: 0: 41977.8. Samples: 85805160. Policy #0 lag: (min: 1.0, avg: 23.7, max: 42.0) [2024-03-29 12:35:51,686][00126] Avg episode reward: [(0, '0.176')] [2024-03-29 12:35:52,257][00501] Updated weights for policy 0, policy_version 5220 (0.0019) [2024-03-29 12:35:56,636][00501] Updated weights for policy 0, policy_version 5230 (0.0023) [2024-03-29 12:35:56,686][00126] Fps is (10 sec: 40959.3, 60 sec: 42325.2, 300 sec: 41931.9). Total num frames: 85688320. Throughput: 0: 41774.5. Samples: 85940900. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 12:35:56,686][00126] Avg episode reward: [(0, '0.240')] [2024-03-29 12:36:00,753][00501] Updated weights for policy 0, policy_version 5240 (0.0022) [2024-03-29 12:36:01,685][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 85901312. Throughput: 0: 42325.5. Samples: 86212540. Policy #0 lag: (min: 2.0, avg: 20.2, max: 43.0) [2024-03-29 12:36:01,686][00126] Avg episode reward: [(0, '0.258')] [2024-03-29 12:36:04,047][00501] Updated weights for policy 0, policy_version 5250 (0.0022) [2024-03-29 12:36:06,685][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 86130688. Throughput: 0: 41987.1. Samples: 86444020. Policy #0 lag: (min: 2.0, avg: 20.2, max: 43.0) [2024-03-29 12:36:06,686][00126] Avg episode reward: [(0, '0.165')] [2024-03-29 12:36:07,759][00501] Updated weights for policy 0, policy_version 5260 (0.0028) [2024-03-29 12:36:11,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 86327296. Throughput: 0: 41939.5. Samples: 86576020. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 12:36:11,686][00126] Avg episode reward: [(0, '0.275')] [2024-03-29 12:36:11,909][00501] Updated weights for policy 0, policy_version 5270 (0.0022) [2024-03-29 12:36:15,921][00501] Updated weights for policy 0, policy_version 5280 (0.0026) [2024-03-29 12:36:16,685][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 86540288. Throughput: 0: 42680.8. Samples: 86854520. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 12:36:16,686][00126] Avg episode reward: [(0, '0.213')] [2024-03-29 12:36:18,025][00481] Signal inference workers to stop experience collection... (3250 times) [2024-03-29 12:36:18,047][00501] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-03-29 12:36:18,231][00481] Signal inference workers to resume experience collection... (3250 times) [2024-03-29 12:36:18,232][00501] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-03-29 12:36:19,348][00501] Updated weights for policy 0, policy_version 5290 (0.0022) [2024-03-29 12:36:21,686][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 86769664. Throughput: 0: 42284.0. Samples: 87079380. Policy #0 lag: (min: 0.0, avg: 21.7, max: 45.0) [2024-03-29 12:36:21,687][00126] Avg episode reward: [(0, '0.194')] [2024-03-29 12:36:23,360][00501] Updated weights for policy 0, policy_version 5300 (0.0021) [2024-03-29 12:36:26,685][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 86949888. Throughput: 0: 42044.9. Samples: 87207040. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:36:26,686][00126] Avg episode reward: [(0, '0.207')] [2024-03-29 12:36:27,745][00501] Updated weights for policy 0, policy_version 5310 (0.0021) [2024-03-29 12:36:31,627][00501] Updated weights for policy 0, policy_version 5320 (0.0018) [2024-03-29 12:36:31,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.5, 300 sec: 41876.4). Total num frames: 87162880. Throughput: 0: 42515.1. Samples: 87485360. Policy #0 lag: (min: 1.0, avg: 18.0, max: 42.0) [2024-03-29 12:36:31,686][00126] Avg episode reward: [(0, '0.251')] [2024-03-29 12:36:35,122][00501] Updated weights for policy 0, policy_version 5330 (0.0026) [2024-03-29 12:36:36,685][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 87408640. Throughput: 0: 42107.1. Samples: 87699980. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 12:36:36,686][00126] Avg episode reward: [(0, '0.242')] [2024-03-29 12:36:38,963][00501] Updated weights for policy 0, policy_version 5340 (0.0019) [2024-03-29 12:36:41,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 87588864. Throughput: 0: 42240.1. Samples: 87841700. Policy #0 lag: (min: 3.0, avg: 23.1, max: 43.0) [2024-03-29 12:36:41,686][00126] Avg episode reward: [(0, '0.228')] [2024-03-29 12:36:43,518][00501] Updated weights for policy 0, policy_version 5350 (0.0028) [2024-03-29 12:36:46,685][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 87769088. Throughput: 0: 42080.9. Samples: 88106180. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 12:36:46,686][00126] Avg episode reward: [(0, '0.287')] [2024-03-29 12:36:47,541][00501] Updated weights for policy 0, policy_version 5360 (0.0027) [2024-03-29 12:36:50,859][00501] Updated weights for policy 0, policy_version 5370 (0.0029) [2024-03-29 12:36:51,686][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 88014848. Throughput: 0: 41942.6. Samples: 88331440. Policy #0 lag: (min: 2.0, avg: 21.7, max: 41.0) [2024-03-29 12:36:51,687][00126] Avg episode reward: [(0, '0.184')] [2024-03-29 12:36:51,758][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005373_88031232.pth... [2024-03-29 12:36:52,127][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000004756_77922304.pth [2024-03-29 12:36:52,165][00481] Signal inference workers to stop experience collection... (3300 times) [2024-03-29 12:36:52,197][00501] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-03-29 12:36:52,396][00481] Signal inference workers to resume experience collection... (3300 times) [2024-03-29 12:36:52,397][00501] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-03-29 12:36:54,815][00501] Updated weights for policy 0, policy_version 5380 (0.0022) [2024-03-29 12:36:56,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 88227840. Throughput: 0: 41897.3. Samples: 88461400. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 12:36:56,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:36:59,033][00501] Updated weights for policy 0, policy_version 5390 (0.0022) [2024-03-29 12:37:01,685][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 88408064. Throughput: 0: 41683.5. Samples: 88730280. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 12:37:01,686][00126] Avg episode reward: [(0, '0.201')] [2024-03-29 12:37:03,173][00501] Updated weights for policy 0, policy_version 5400 (0.0028) [2024-03-29 12:37:06,549][00501] Updated weights for policy 0, policy_version 5410 (0.0024) [2024-03-29 12:37:06,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 88637440. Throughput: 0: 41976.1. Samples: 88968300. Policy #0 lag: (min: 0.0, avg: 18.1, max: 42.0) [2024-03-29 12:37:06,686][00126] Avg episode reward: [(0, '0.202')] [2024-03-29 12:37:10,530][00501] Updated weights for policy 0, policy_version 5420 (0.0026) [2024-03-29 12:37:11,686][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 88850432. Throughput: 0: 41827.8. Samples: 89089300. Policy #0 lag: (min: 2.0, avg: 23.0, max: 42.0) [2024-03-29 12:37:11,686][00126] Avg episode reward: [(0, '0.239')] [2024-03-29 12:37:14,615][00501] Updated weights for policy 0, policy_version 5430 (0.0018) [2024-03-29 12:37:16,685][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 89030656. Throughput: 0: 41778.2. Samples: 89365380. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 12:37:16,686][00126] Avg episode reward: [(0, '0.243')] [2024-03-29 12:37:18,731][00501] Updated weights for policy 0, policy_version 5440 (0.0020) [2024-03-29 12:37:21,686][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 89276416. Throughput: 0: 42238.1. Samples: 89600700. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 12:37:21,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:37:21,921][00501] Updated weights for policy 0, policy_version 5450 (0.0025) [2024-03-29 12:37:26,064][00501] Updated weights for policy 0, policy_version 5460 (0.0022) [2024-03-29 12:37:26,466][00481] Signal inference workers to stop experience collection... (3350 times) [2024-03-29 12:37:26,509][00501] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-03-29 12:37:26,685][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 89473024. Throughput: 0: 41780.1. Samples: 89721800. Policy #0 lag: (min: 0.0, avg: 23.8, max: 42.0) [2024-03-29 12:37:26,686][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 12:37:26,687][00481] Signal inference workers to resume experience collection... (3350 times) [2024-03-29 12:37:26,687][00501] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-03-29 12:37:30,341][00501] Updated weights for policy 0, policy_version 5470 (0.0025) [2024-03-29 12:37:31,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 89653248. Throughput: 0: 41778.6. Samples: 89986220. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 12:37:31,686][00126] Avg episode reward: [(0, '0.231')] [2024-03-29 12:37:34,383][00501] Updated weights for policy 0, policy_version 5480 (0.0021) [2024-03-29 12:37:36,685][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 89915392. Throughput: 0: 42072.9. Samples: 90224720. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 12:37:36,688][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 12:37:36,689][00481] Saving new best policy, reward=0.305! [2024-03-29 12:37:37,674][00501] Updated weights for policy 0, policy_version 5490 (0.0037) [2024-03-29 12:37:41,685][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 90095616. Throughput: 0: 41635.6. Samples: 90335000. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 12:37:41,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:37:41,807][00501] Updated weights for policy 0, policy_version 5500 (0.0021) [2024-03-29 12:37:45,805][00501] Updated weights for policy 0, policy_version 5510 (0.0027) [2024-03-29 12:37:46,685][00126] Fps is (10 sec: 37683.6, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 90292224. Throughput: 0: 41962.7. Samples: 90618600. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:37:46,686][00126] Avg episode reward: [(0, '0.245')] [2024-03-29 12:37:50,129][00501] Updated weights for policy 0, policy_version 5520 (0.0025) [2024-03-29 12:37:51,686][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 90521600. Throughput: 0: 41782.1. Samples: 90848500. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 12:37:51,686][00126] Avg episode reward: [(0, '0.238')] [2024-03-29 12:37:53,404][00501] Updated weights for policy 0, policy_version 5530 (0.0022) [2024-03-29 12:37:56,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 90734592. Throughput: 0: 41761.9. Samples: 90968580. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 12:37:56,686][00126] Avg episode reward: [(0, '0.193')] [2024-03-29 12:37:57,566][00501] Updated weights for policy 0, policy_version 5540 (0.0022) [2024-03-29 12:38:01,643][00501] Updated weights for policy 0, policy_version 5550 (0.0022) [2024-03-29 12:38:01,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 90931200. Throughput: 0: 41403.0. Samples: 91228520. Policy #0 lag: (min: 0.0, avg: 22.2, max: 44.0) [2024-03-29 12:38:01,686][00126] Avg episode reward: [(0, '0.189')] [2024-03-29 12:38:03,906][00481] Signal inference workers to stop experience collection... (3400 times) [2024-03-29 12:38:03,950][00501] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-03-29 12:38:04,126][00481] Signal inference workers to resume experience collection... (3400 times) [2024-03-29 12:38:04,126][00501] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-03-29 12:38:05,949][00501] Updated weights for policy 0, policy_version 5560 (0.0019) [2024-03-29 12:38:06,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 91127808. Throughput: 0: 41939.6. Samples: 91487980. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 12:38:06,686][00126] Avg episode reward: [(0, '0.236')] [2024-03-29 12:38:09,098][00501] Updated weights for policy 0, policy_version 5570 (0.0025) [2024-03-29 12:38:11,687][00126] Fps is (10 sec: 44230.1, 60 sec: 42051.2, 300 sec: 41987.2). Total num frames: 91373568. Throughput: 0: 41726.9. Samples: 91599580. Policy #0 lag: (min: 2.0, avg: 23.2, max: 41.0) [2024-03-29 12:38:11,688][00126] Avg episode reward: [(0, '0.180')] [2024-03-29 12:38:13,182][00501] Updated weights for policy 0, policy_version 5580 (0.0028) [2024-03-29 12:38:16,685][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 91553792. Throughput: 0: 41579.6. Samples: 91857300. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 12:38:16,687][00126] Avg episode reward: [(0, '0.250')] [2024-03-29 12:38:17,248][00501] Updated weights for policy 0, policy_version 5590 (0.0024) [2024-03-29 12:38:21,552][00501] Updated weights for policy 0, policy_version 5600 (0.0023) [2024-03-29 12:38:21,685][00126] Fps is (10 sec: 37689.6, 60 sec: 41233.2, 300 sec: 41765.3). Total num frames: 91750400. Throughput: 0: 42297.5. Samples: 92128100. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 12:38:21,686][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 12:38:22,053][00481] Saving new best policy, reward=0.334! [2024-03-29 12:38:24,951][00501] Updated weights for policy 0, policy_version 5610 (0.0022) [2024-03-29 12:38:26,685][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 92012544. Throughput: 0: 41871.5. Samples: 92219220. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 12:38:26,686][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 12:38:28,893][00501] Updated weights for policy 0, policy_version 5620 (0.0019) [2024-03-29 12:38:31,686][00126] Fps is (10 sec: 42597.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 92176384. Throughput: 0: 41219.9. Samples: 92473500. Policy #0 lag: (min: 0.0, avg: 23.4, max: 42.0) [2024-03-29 12:38:31,686][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 12:38:33,149][00501] Updated weights for policy 0, policy_version 5630 (0.0028) [2024-03-29 12:38:36,685][00126] Fps is (10 sec: 34407.0, 60 sec: 40687.0, 300 sec: 41765.3). Total num frames: 92356608. Throughput: 0: 42437.5. Samples: 92758180. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 12:38:36,686][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 12:38:37,273][00501] Updated weights for policy 0, policy_version 5640 (0.0026) [2024-03-29 12:38:37,303][00481] Signal inference workers to stop experience collection... (3450 times) [2024-03-29 12:38:37,339][00501] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-03-29 12:38:37,522][00481] Signal inference workers to resume experience collection... (3450 times) [2024-03-29 12:38:37,523][00501] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-03-29 12:38:40,524][00501] Updated weights for policy 0, policy_version 5650 (0.0024) [2024-03-29 12:38:41,685][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 92618752. Throughput: 0: 41838.7. Samples: 92851320. Policy #0 lag: (min: 2.0, avg: 22.4, max: 43.0) [2024-03-29 12:38:41,686][00126] Avg episode reward: [(0, '0.235')] [2024-03-29 12:38:44,657][00501] Updated weights for policy 0, policy_version 5660 (0.0022) [2024-03-29 12:38:46,685][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 92815360. Throughput: 0: 41597.9. Samples: 93100420. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 12:38:46,686][00126] Avg episode reward: [(0, '0.282')] [2024-03-29 12:38:48,620][00501] Updated weights for policy 0, policy_version 5670 (0.0027) [2024-03-29 12:38:51,685][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 92995584. Throughput: 0: 41930.6. Samples: 93374860. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:38:51,687][00126] Avg episode reward: [(0, '0.278')] [2024-03-29 12:38:51,710][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005676_92995584.pth... [2024-03-29 12:38:52,057][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005065_82984960.pth [2024-03-29 12:38:53,111][00501] Updated weights for policy 0, policy_version 5680 (0.0027) [2024-03-29 12:38:56,254][00501] Updated weights for policy 0, policy_version 5690 (0.0023) [2024-03-29 12:38:56,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 93224960. Throughput: 0: 41849.5. Samples: 93482740. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 12:38:56,686][00126] Avg episode reward: [(0, '0.220')] [2024-03-29 12:39:00,290][00501] Updated weights for policy 0, policy_version 5700 (0.0024) [2024-03-29 12:39:01,685][00126] Fps is (10 sec: 45875.5, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 93454336. Throughput: 0: 41443.6. Samples: 93722260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 12:39:01,686][00126] Avg episode reward: [(0, '0.226')] [2024-03-29 12:39:04,291][00501] Updated weights for policy 0, policy_version 5710 (0.0021) [2024-03-29 12:39:06,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 93618176. Throughput: 0: 41621.7. Samples: 94001080. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 12:39:06,686][00126] Avg episode reward: [(0, '0.246')] [2024-03-29 12:39:08,677][00501] Updated weights for policy 0, policy_version 5720 (0.0030) [2024-03-29 12:39:08,685][00481] Signal inference workers to stop experience collection... (3500 times) [2024-03-29 12:39:08,686][00481] Signal inference workers to resume experience collection... (3500 times) [2024-03-29 12:39:08,725][00501] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-03-29 12:39:08,725][00501] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-03-29 12:39:11,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41507.3, 300 sec: 41876.4). Total num frames: 93863936. Throughput: 0: 42169.9. Samples: 94116860. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 12:39:11,686][00126] Avg episode reward: [(0, '0.278')] [2024-03-29 12:39:11,832][00501] Updated weights for policy 0, policy_version 5730 (0.0030) [2024-03-29 12:39:16,000][00501] Updated weights for policy 0, policy_version 5740 (0.0030) [2024-03-29 12:39:16,685][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 94076928. Throughput: 0: 41788.6. Samples: 94353980. Policy #0 lag: (min: 2.0, avg: 23.1, max: 42.0) [2024-03-29 12:39:16,686][00126] Avg episode reward: [(0, '0.219')] [2024-03-29 12:39:20,038][00501] Updated weights for policy 0, policy_version 5750 (0.0017) [2024-03-29 12:39:21,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 94257152. Throughput: 0: 41425.7. Samples: 94622340. Policy #0 lag: (min: 0.0, avg: 19.7, max: 41.0) [2024-03-29 12:39:21,686][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 12:39:24,232][00501] Updated weights for policy 0, policy_version 5760 (0.0019) [2024-03-29 12:39:26,685][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.3, 300 sec: 41876.4). Total num frames: 94502912. Throughput: 0: 42096.5. Samples: 94745660. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 12:39:26,686][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 12:39:27,296][00501] Updated weights for policy 0, policy_version 5770 (0.0025) [2024-03-29 12:39:31,417][00501] Updated weights for policy 0, policy_version 5780 (0.0027) [2024-03-29 12:39:31,685][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.5, 300 sec: 41931.9). Total num frames: 94715904. Throughput: 0: 41910.8. Samples: 94986400. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:39:31,686][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 12:39:35,518][00501] Updated weights for policy 0, policy_version 5790 (0.0022) [2024-03-29 12:39:36,685][00126] Fps is (10 sec: 39321.1, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 94896128. Throughput: 0: 41928.9. Samples: 95261660. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 12:39:36,686][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 12:39:36,915][00481] Saving new best policy, reward=0.336! [2024-03-29 12:39:39,765][00501] Updated weights for policy 0, policy_version 5800 (0.0020) [2024-03-29 12:39:40,801][00481] Signal inference workers to stop experience collection... (3550 times) [2024-03-29 12:39:40,838][00501] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-03-29 12:39:41,014][00481] Signal inference workers to resume experience collection... (3550 times) [2024-03-29 12:39:41,014][00501] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-03-29 12:39:41,686][00126] Fps is (10 sec: 40959.0, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 95125504. Throughput: 0: 42180.8. Samples: 95380880. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 12:39:41,686][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 12:39:42,997][00501] Updated weights for policy 0, policy_version 5810 (0.0021) [2024-03-29 12:39:46,686][00126] Fps is (10 sec: 44232.8, 60 sec: 42051.6, 300 sec: 41876.3). Total num frames: 95338496. Throughput: 0: 42022.2. Samples: 95613300. Policy #0 lag: (min: 2.0, avg: 22.2, max: 42.0) [2024-03-29 12:39:46,687][00126] Avg episode reward: [(0, '0.236')] [2024-03-29 12:39:46,969][00501] Updated weights for policy 0, policy_version 5820 (0.0030) [2024-03-29 12:39:50,888][00501] Updated weights for policy 0, policy_version 5830 (0.0027) [2024-03-29 12:39:51,685][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 95535104. Throughput: 0: 41870.1. Samples: 95885240. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 12:39:51,686][00126] Avg episode reward: [(0, '0.226')] [2024-03-29 12:39:55,295][00501] Updated weights for policy 0, policy_version 5840 (0.0027) [2024-03-29 12:39:56,685][00126] Fps is (10 sec: 40963.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 95748096. Throughput: 0: 42349.8. Samples: 96022600. Policy #0 lag: (min: 1.0, avg: 17.8, max: 42.0) [2024-03-29 12:39:56,688][00126] Avg episode reward: [(0, '0.276')] [2024-03-29 12:39:58,584][00501] Updated weights for policy 0, policy_version 5850 (0.0025) [2024-03-29 12:40:01,685][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 95977472. Throughput: 0: 42093.8. Samples: 96248200. Policy #0 lag: (min: 2.0, avg: 22.7, max: 42.0) [2024-03-29 12:40:01,686][00126] Avg episode reward: [(0, '0.264')] [2024-03-29 12:40:02,300][00501] Updated weights for policy 0, policy_version 5860 (0.0018) [2024-03-29 12:40:06,155][00501] Updated weights for policy 0, policy_version 5870 (0.0022) [2024-03-29 12:40:06,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 96174080. Throughput: 0: 41999.0. Samples: 96512300. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 12:40:06,686][00126] Avg episode reward: [(0, '0.229')] [2024-03-29 12:40:10,689][00501] Updated weights for policy 0, policy_version 5880 (0.0025) [2024-03-29 12:40:11,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 96370688. Throughput: 0: 42629.6. Samples: 96664000. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 12:40:11,686][00126] Avg episode reward: [(0, '0.256')] [2024-03-29 12:40:13,351][00481] Signal inference workers to stop experience collection... (3600 times) [2024-03-29 12:40:13,404][00501] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-03-29 12:40:13,444][00481] Signal inference workers to resume experience collection... (3600 times) [2024-03-29 12:40:13,449][00501] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-03-29 12:40:13,736][00501] Updated weights for policy 0, policy_version 5890 (0.0029) [2024-03-29 12:40:16,685][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 96616448. Throughput: 0: 42450.2. Samples: 96896660. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 12:40:16,686][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:40:16,686][00481] Saving new best policy, reward=0.358! [2024-03-29 12:40:17,841][00501] Updated weights for policy 0, policy_version 5900 (0.0021) [2024-03-29 12:40:21,684][00501] Updated weights for policy 0, policy_version 5910 (0.0026) [2024-03-29 12:40:21,685][00126] Fps is (10 sec: 45875.0, 60 sec: 42871.4, 300 sec: 42154.1). Total num frames: 96829440. Throughput: 0: 41862.6. Samples: 97145480. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:40:21,686][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 12:40:26,219][00501] Updated weights for policy 0, policy_version 5920 (0.0027) [2024-03-29 12:40:26,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 97009664. Throughput: 0: 42615.6. Samples: 97298580. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 12:40:26,686][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 12:40:29,275][00501] Updated weights for policy 0, policy_version 5930 (0.0018) [2024-03-29 12:40:31,685][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 97239040. Throughput: 0: 42362.7. Samples: 97519580. Policy #0 lag: (min: 3.0, avg: 22.7, max: 43.0) [2024-03-29 12:40:31,688][00126] Avg episode reward: [(0, '0.259')] [2024-03-29 12:40:33,239][00501] Updated weights for policy 0, policy_version 5940 (0.0028) [2024-03-29 12:40:36,685][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 97452032. Throughput: 0: 42114.7. Samples: 97780400. Policy #0 lag: (min: 0.0, avg: 22.6, max: 42.0) [2024-03-29 12:40:36,686][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 12:40:37,282][00501] Updated weights for policy 0, policy_version 5950 (0.0018) [2024-03-29 12:40:41,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 97632256. Throughput: 0: 42193.7. Samples: 97921320. Policy #0 lag: (min: 1.0, avg: 18.1, max: 43.0) [2024-03-29 12:40:41,686][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 12:40:41,831][00501] Updated weights for policy 0, policy_version 5960 (0.0021) [2024-03-29 12:40:45,159][00501] Updated weights for policy 0, policy_version 5970 (0.0026) [2024-03-29 12:40:46,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.9, 300 sec: 41931.9). Total num frames: 97878016. Throughput: 0: 42322.6. Samples: 98152720. Policy #0 lag: (min: 1.0, avg: 18.1, max: 43.0) [2024-03-29 12:40:46,686][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 12:40:46,917][00481] Signal inference workers to stop experience collection... (3650 times) [2024-03-29 12:40:46,988][00501] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-03-29 12:40:46,994][00481] Signal inference workers to resume experience collection... (3650 times) [2024-03-29 12:40:47,014][00501] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-03-29 12:40:48,770][00501] Updated weights for policy 0, policy_version 5980 (0.0019) [2024-03-29 12:40:51,685][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 98074624. Throughput: 0: 42135.6. Samples: 98408400. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:40:51,686][00126] Avg episode reward: [(0, '0.266')] [2024-03-29 12:40:51,818][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005987_98091008.pth... [2024-03-29 12:40:52,160][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005373_88031232.pth [2024-03-29 12:40:52,975][00501] Updated weights for policy 0, policy_version 5990 (0.0021) [2024-03-29 12:40:56,685][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 98238464. Throughput: 0: 41657.5. Samples: 98538580. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 12:40:56,686][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 12:40:57,639][00501] Updated weights for policy 0, policy_version 6000 (0.0029) [2024-03-29 12:41:01,004][00501] Updated weights for policy 0, policy_version 6010 (0.0019) [2024-03-29 12:41:01,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 98500608. Throughput: 0: 41936.3. Samples: 98783800. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 12:41:01,686][00126] Avg episode reward: [(0, '0.271')] [2024-03-29 12:41:04,575][00501] Updated weights for policy 0, policy_version 6020 (0.0019) [2024-03-29 12:41:06,685][00126] Fps is (10 sec: 47512.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 98713600. Throughput: 0: 41786.2. Samples: 99025860. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:41:06,686][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 12:41:08,456][00501] Updated weights for policy 0, policy_version 6030 (0.0021) [2024-03-29 12:41:11,685][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 98877440. Throughput: 0: 41613.0. Samples: 99171160. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 12:41:11,686][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 12:41:13,038][00501] Updated weights for policy 0, policy_version 6040 (0.0032) [2024-03-29 12:41:16,330][00501] Updated weights for policy 0, policy_version 6050 (0.0018) [2024-03-29 12:41:16,685][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 99123200. Throughput: 0: 42301.3. Samples: 99423140. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 12:41:16,686][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 12:41:19,793][00481] Signal inference workers to stop experience collection... (3700 times) [2024-03-29 12:41:19,830][00501] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-03-29 12:41:20,010][00481] Signal inference workers to resume experience collection... (3700 times) [2024-03-29 12:41:20,010][00501] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-03-29 12:41:20,013][00501] Updated weights for policy 0, policy_version 6060 (0.0019) [2024-03-29 12:41:21,685][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 99352576. Throughput: 0: 41689.4. Samples: 99656420. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 12:41:21,686][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 12:41:24,169][00501] Updated weights for policy 0, policy_version 6070 (0.0023) [2024-03-29 12:41:26,685][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 99516416. Throughput: 0: 41664.4. Samples: 99796220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 12:41:26,686][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 12:41:28,660][00501] Updated weights for policy 0, policy_version 6080 (0.0022) [2024-03-29 12:41:31,685][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 99762176. Throughput: 0: 42310.3. Samples: 100056680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 12:41:31,686][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:41:31,749][00501] Updated weights for policy 0, policy_version 6090 (0.0023) [2024-03-29 12:41:35,457][00501] Updated weights for policy 0, policy_version 6100 (0.0019) [2024-03-29 12:41:36,685][00126] Fps is (10 sec: 47513.4, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 99991552. Throughput: 0: 41836.4. Samples: 100291040. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 12:41:36,686][00126] Avg episode reward: [(0, '0.245')] [2024-03-29 12:41:39,562][00501] Updated weights for policy 0, policy_version 6110 (0.0018) [2024-03-29 12:41:41,685][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 100155392. Throughput: 0: 42064.8. Samples: 100431500. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:41:41,686][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 12:41:44,116][00501] Updated weights for policy 0, policy_version 6120 (0.0019) [2024-03-29 12:41:46,685][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41932.0). Total num frames: 100384768. Throughput: 0: 42465.0. Samples: 100694720. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 12:41:46,686][00126] Avg episode reward: [(0, '0.288')] [2024-03-29 12:41:47,411][00501] Updated weights for policy 0, policy_version 6130 (0.0023) [2024-03-29 12:41:50,968][00501] Updated weights for policy 0, policy_version 6140 (0.0024) [2024-03-29 12:41:51,685][00126] Fps is (10 sec: 45875.4, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 100614144. Throughput: 0: 42256.5. Samples: 100927400. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 12:41:51,686][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 12:41:54,728][00481] Signal inference workers to stop experience collection... (3750 times) [2024-03-29 12:41:54,728][00481] Signal inference workers to resume experience collection... (3750 times) [2024-03-29 12:41:54,765][00501] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-03-29 12:41:54,765][00501] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-03-29 12:41:55,028][00501] Updated weights for policy 0, policy_version 6150 (0.0020) [2024-03-29 12:41:56,685][00126] Fps is (10 sec: 42598.1, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 100810752. Throughput: 0: 42025.3. Samples: 101062300. Policy #0 lag: (min: 0.0, avg: 22.6, max: 41.0) [2024-03-29 12:41:56,686][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 12:41:59,677][00501] Updated weights for policy 0, policy_version 6160 (0.0019) [2024-03-29 12:42:01,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 101007360. Throughput: 0: 42219.0. Samples: 101323000. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 12:42:01,686][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 12:42:02,925][00501] Updated weights for policy 0, policy_version 6170 (0.0020) [2024-03-29 12:42:06,429][00501] Updated weights for policy 0, policy_version 6180 (0.0023) [2024-03-29 12:42:06,685][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 101253120. Throughput: 0: 42331.9. Samples: 101561360. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 12:42:06,686][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 12:42:10,566][00501] Updated weights for policy 0, policy_version 6190 (0.0028) [2024-03-29 12:42:11,685][00126] Fps is (10 sec: 44237.3, 60 sec: 42871.5, 300 sec: 42098.5). Total num frames: 101449728. Throughput: 0: 42138.2. Samples: 101692440. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 12:42:11,686][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 12:42:15,175][00501] Updated weights for policy 0, policy_version 6200 (0.0018) [2024-03-29 12:42:16,685][00126] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 41932.0). Total num frames: 101646336. Throughput: 0: 42355.6. Samples: 101962680. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 12:42:16,687][00126] Avg episode reward: [(0, '0.274')] [2024-03-29 12:42:18,447][00501] Updated weights for policy 0, policy_version 6210 (0.0021) [2024-03-29 12:42:21,685][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 101875712. Throughput: 0: 42284.4. Samples: 102193840. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 12:42:21,686][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 12:42:21,797][00481] Saving new best policy, reward=0.383! [2024-03-29 12:42:22,504][00501] Updated weights for policy 0, policy_version 6220 (0.0018) [2024-03-29 12:42:26,360][00501] Updated weights for policy 0, policy_version 6230 (0.0018) [2024-03-29 12:42:26,685][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 102072320. Throughput: 0: 41860.5. Samples: 102315220. Policy #0 lag: (min: 1.0, avg: 22.5, max: 42.0) [2024-03-29 12:42:26,686][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 12:42:29,902][00481] Signal inference workers to stop experience collection... (3800 times) [2024-03-29 12:42:29,904][00481] Signal inference workers to resume experience collection... (3800 times) [2024-03-29 12:42:29,949][00501] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-03-29 12:42:29,950][00501] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-03-29 12:42:30,893][00501] Updated weights for policy 0, policy_version 6240 (0.0025) [2024-03-29 12:42:31,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 102268928. Throughput: 0: 42155.0. Samples: 102591700. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 12:42:31,686][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 12:42:34,054][00501] Updated weights for policy 0, policy_version 6250 (0.0021) [2024-03-29 12:42:36,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 102498304. Throughput: 0: 42220.0. Samples: 102827300. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:42:36,686][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 12:42:37,699][00501] Updated weights for policy 0, policy_version 6260 (0.0020) [2024-03-29 12:42:41,685][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 102711296. Throughput: 0: 41916.8. Samples: 102948560. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:42:41,688][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 12:42:41,745][00501] Updated weights for policy 0, policy_version 6270 (0.0022) [2024-03-29 12:42:46,534][00501] Updated weights for policy 0, policy_version 6280 (0.0019) [2024-03-29 12:42:46,685][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 102891520. Throughput: 0: 42225.9. Samples: 103223160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 12:42:46,686][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 12:42:49,726][00501] Updated weights for policy 0, policy_version 6290 (0.0019) [2024-03-29 12:42:51,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 103120896. Throughput: 0: 41963.1. Samples: 103449700. Policy #0 lag: (min: 1.0, avg: 19.4, max: 42.0) [2024-03-29 12:42:51,686][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 12:42:51,837][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006295_103137280.pth... [2024-03-29 12:42:52,189][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005676_92995584.pth [2024-03-29 12:42:53,576][00501] Updated weights for policy 0, policy_version 6300 (0.0023) [2024-03-29 12:42:56,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 103317504. Throughput: 0: 41496.5. Samples: 103559780. Policy #0 lag: (min: 2.0, avg: 22.0, max: 42.0) [2024-03-29 12:42:56,686][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 12:42:57,738][00501] Updated weights for policy 0, policy_version 6310 (0.0026) [2024-03-29 12:43:01,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 103497728. Throughput: 0: 41743.1. Samples: 103841120. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 12:43:01,686][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 12:43:02,496][00481] Signal inference workers to stop experience collection... (3850 times) [2024-03-29 12:43:02,546][00501] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-03-29 12:43:02,582][00481] Signal inference workers to resume experience collection... (3850 times) [2024-03-29 12:43:02,590][00501] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-03-29 12:43:02,593][00501] Updated weights for policy 0, policy_version 6320 (0.0026) [2024-03-29 12:43:05,730][00501] Updated weights for policy 0, policy_version 6330 (0.0020) [2024-03-29 12:43:06,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 41987.7). Total num frames: 103759872. Throughput: 0: 41577.4. Samples: 104064820. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 12:43:06,686][00126] Avg episode reward: [(0, '0.257')] [2024-03-29 12:43:09,286][00501] Updated weights for policy 0, policy_version 6340 (0.0028) [2024-03-29 12:43:11,685][00126] Fps is (10 sec: 45875.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 103956480. Throughput: 0: 41798.7. Samples: 104196160. Policy #0 lag: (min: 0.0, avg: 23.1, max: 43.0) [2024-03-29 12:43:11,686][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 12:43:13,408][00501] Updated weights for policy 0, policy_version 6350 (0.0018) [2024-03-29 12:43:16,685][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 104136704. Throughput: 0: 41726.7. Samples: 104469400. Policy #0 lag: (min: 0.0, avg: 23.1, max: 43.0) [2024-03-29 12:43:16,688][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 12:43:17,914][00501] Updated weights for policy 0, policy_version 6360 (0.0027) [2024-03-29 12:43:21,071][00501] Updated weights for policy 0, policy_version 6370 (0.0026) [2024-03-29 12:43:21,685][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 104382464. Throughput: 0: 41700.0. Samples: 104703800. Policy #0 lag: (min: 1.0, avg: 18.9, max: 43.0) [2024-03-29 12:43:21,686][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 12:43:24,775][00501] Updated weights for policy 0, policy_version 6380 (0.0026) [2024-03-29 12:43:26,686][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 104595456. Throughput: 0: 42034.2. Samples: 104840100. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 12:43:26,686][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 12:43:29,136][00501] Updated weights for policy 0, policy_version 6390 (0.0027) [2024-03-29 12:43:31,686][00126] Fps is (10 sec: 37682.8, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 104759296. Throughput: 0: 41324.3. Samples: 105082760. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 12:43:31,686][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 12:43:33,605][00501] Updated weights for policy 0, policy_version 6400 (0.0018) [2024-03-29 12:43:35,595][00481] Signal inference workers to stop experience collection... (3900 times) [2024-03-29 12:43:35,625][00501] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-03-29 12:43:35,784][00481] Signal inference workers to resume experience collection... (3900 times) [2024-03-29 12:43:35,784][00501] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-03-29 12:43:36,612][00501] Updated weights for policy 0, policy_version 6410 (0.0023) [2024-03-29 12:43:36,685][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 105021440. Throughput: 0: 41853.4. Samples: 105333100. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 12:43:36,686][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 12:43:40,440][00501] Updated weights for policy 0, policy_version 6420 (0.0034) [2024-03-29 12:43:41,685][00126] Fps is (10 sec: 49152.0, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 105250816. Throughput: 0: 42437.6. Samples: 105469480. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 12:43:41,686][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 12:43:44,513][00501] Updated weights for policy 0, policy_version 6430 (0.0023) [2024-03-29 12:43:46,685][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 105398272. Throughput: 0: 41695.1. Samples: 105717400. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 12:43:46,688][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 12:43:49,039][00501] Updated weights for policy 0, policy_version 6440 (0.0027) [2024-03-29 12:43:51,685][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 105644032. Throughput: 0: 42540.8. Samples: 105979160. Policy #0 lag: (min: 0.0, avg: 17.7, max: 41.0) [2024-03-29 12:43:51,686][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 12:43:52,144][00481] Saving new best policy, reward=0.417! [2024-03-29 12:43:52,162][00501] Updated weights for policy 0, policy_version 6450 (0.0026) [2024-03-29 12:43:56,088][00501] Updated weights for policy 0, policy_version 6460 (0.0022) [2024-03-29 12:43:56,685][00126] Fps is (10 sec: 47513.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 105873408. Throughput: 0: 42129.4. Samples: 106091980. Policy #0 lag: (min: 2.0, avg: 22.6, max: 41.0) [2024-03-29 12:43:56,686][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 12:44:00,353][00501] Updated weights for policy 0, policy_version 6470 (0.0025) [2024-03-29 12:44:01,685][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 106053632. Throughput: 0: 41571.5. Samples: 106340120. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:44:01,686][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 12:44:04,806][00501] Updated weights for policy 0, policy_version 6480 (0.0018) [2024-03-29 12:44:06,685][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 106250240. Throughput: 0: 42265.7. Samples: 106605760. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 12:44:06,686][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 12:44:08,047][00501] Updated weights for policy 0, policy_version 6490 (0.0025) [2024-03-29 12:44:11,256][00481] Signal inference workers to stop experience collection... (3950 times) [2024-03-29 12:44:11,306][00501] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-03-29 12:44:11,343][00481] Signal inference workers to resume experience collection... (3950 times) [2024-03-29 12:44:11,346][00501] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-03-29 12:44:11,685][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 106479616. Throughput: 0: 41686.3. Samples: 106715980. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:44:11,686][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 12:44:11,896][00501] Updated weights for policy 0, policy_version 6500 (0.0024) [2024-03-29 12:44:16,063][00501] Updated weights for policy 0, policy_version 6510 (0.0021) [2024-03-29 12:44:16,685][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 106692608. Throughput: 0: 41967.7. Samples: 106971300. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 12:44:16,686][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 12:44:20,537][00501] Updated weights for policy 0, policy_version 6520 (0.0019) [2024-03-29 12:44:21,685][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 106872832. Throughput: 0: 41835.5. Samples: 107215700. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 12:44:21,686][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 12:44:23,893][00501] Updated weights for policy 0, policy_version 6530 (0.0034) [2024-03-29 12:44:26,685][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 107085824. Throughput: 0: 41379.2. Samples: 107331540. Policy #0 lag: (min: 2.0, avg: 21.6, max: 42.0) [2024-03-29 12:44:26,686][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 12:44:27,838][00501] Updated weights for policy 0, policy_version 6540 (0.0028) [2024-03-29 12:44:31,685][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 107282432. Throughput: 0: 41248.4. Samples: 107573580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 12:44:31,686][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 12:44:32,161][00501] Updated weights for policy 0, policy_version 6550 (0.0021) [2024-03-29 12:44:36,367][00501] Updated weights for policy 0, policy_version 6560 (0.0022) [2024-03-29 12:44:36,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41233.2, 300 sec: 41932.0). Total num frames: 107495424. Throughput: 0: 41295.2. Samples: 107837440. Policy #0 lag: (min: 2.0, avg: 19.0, max: 42.0) [2024-03-29 12:44:36,686][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 12:44:39,656][00501] Updated weights for policy 0, policy_version 6570 (0.0024) [2024-03-29 12:44:41,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41233.2, 300 sec: 41987.6). Total num frames: 107724800. Throughput: 0: 41388.8. Samples: 107954480. Policy #0 lag: (min: 2.0, avg: 19.0, max: 42.0) [2024-03-29 12:44:41,686][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 12:44:43,002][00481] Signal inference workers to stop experience collection... (4000 times) [2024-03-29 12:44:43,078][00501] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-03-29 12:44:43,082][00481] Signal inference workers to resume experience collection... (4000 times) [2024-03-29 12:44:43,106][00501] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-03-29 12:44:43,660][00501] Updated weights for policy 0, policy_version 6580 (0.0022) [2024-03-29 12:44:46,685][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 107921408. Throughput: 0: 41432.5. Samples: 108204580. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 12:44:46,686][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 12:44:47,660][00501] Updated weights for policy 0, policy_version 6590 (0.0023) [2024-03-29 12:44:51,685][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 108118016. Throughput: 0: 41624.1. Samples: 108478840. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 12:44:51,686][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 12:44:51,948][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006600_108134400.pth... [2024-03-29 12:44:51,949][00501] Updated weights for policy 0, policy_version 6600 (0.0023) [2024-03-29 12:44:52,362][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000005987_98091008.pth [2024-03-29 12:44:55,473][00501] Updated weights for policy 0, policy_version 6610 (0.0024) [2024-03-29 12:44:56,687][00126] Fps is (10 sec: 42589.5, 60 sec: 41231.6, 300 sec: 41931.6). Total num frames: 108347392. Throughput: 0: 41733.7. Samples: 108594080. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 12:44:56,688][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 12:44:59,348][00501] Updated weights for policy 0, policy_version 6620 (0.0030) [2024-03-29 12:45:01,685][00126] Fps is (10 sec: 44236.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 108560384. Throughput: 0: 41150.2. Samples: 108823060. Policy #0 lag: (min: 0.0, avg: 22.7, max: 44.0) [2024-03-29 12:45:01,686][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 12:45:03,262][00501] Updated weights for policy 0, policy_version 6630 (0.0030) [2024-03-29 12:45:06,685][00126] Fps is (10 sec: 37690.7, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 108724224. Throughput: 0: 42070.3. Samples: 109108860. Policy #0 lag: (min: 0.0, avg: 22.7, max: 44.0) [2024-03-29 12:45:06,687][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 12:45:07,693][00501] Updated weights for policy 0, policy_version 6640 (0.0020) [2024-03-29 12:45:11,201][00501] Updated weights for policy 0, policy_version 6650 (0.0034) [2024-03-29 12:45:11,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 108969984. Throughput: 0: 41991.6. Samples: 109221160. Policy #0 lag: (min: 1.0, avg: 17.5, max: 40.0) [2024-03-29 12:45:11,686][00126] Avg episode reward: [(0, '0.281')] [2024-03-29 12:45:14,067][00481] Signal inference workers to stop experience collection... (4050 times) [2024-03-29 12:45:14,103][00501] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-03-29 12:45:14,292][00481] Signal inference workers to resume experience collection... (4050 times) [2024-03-29 12:45:14,293][00501] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-03-29 12:45:15,268][00501] Updated weights for policy 0, policy_version 6660 (0.0023) [2024-03-29 12:45:16,685][00126] Fps is (10 sec: 45875.7, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 109182976. Throughput: 0: 41699.6. Samples: 109450060. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 12:45:16,686][00126] Avg episode reward: [(0, '0.278')] [2024-03-29 12:45:19,150][00501] Updated weights for policy 0, policy_version 6670 (0.0019) [2024-03-29 12:45:21,685][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 109346816. Throughput: 0: 41938.2. Samples: 109724660. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 12:45:21,686][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 12:45:23,348][00501] Updated weights for policy 0, policy_version 6680 (0.0025) [2024-03-29 12:45:26,685][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 109592576. Throughput: 0: 42120.4. Samples: 109849900. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 12:45:26,686][00126] Avg episode reward: [(0, '0.268')] [2024-03-29 12:45:26,721][00501] Updated weights for policy 0, policy_version 6690 (0.0034) [2024-03-29 12:45:30,888][00501] Updated weights for policy 0, policy_version 6700 (0.0029) [2024-03-29 12:45:31,685][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 109805568. Throughput: 0: 41810.6. Samples: 110086060. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 12:45:31,686][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 12:45:34,864][00501] Updated weights for policy 0, policy_version 6710 (0.0023) [2024-03-29 12:45:36,686][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 109985792. Throughput: 0: 41364.3. Samples: 110340240. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 12:45:36,686][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 12:45:39,032][00501] Updated weights for policy 0, policy_version 6720 (0.0023) [2024-03-29 12:45:41,685][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 110215168. Throughput: 0: 41481.6. Samples: 110460660. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 12:45:41,686][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 12:45:42,523][00501] Updated weights for policy 0, policy_version 6730 (0.0018) [2024-03-29 12:45:42,584][00481] Signal inference workers to stop experience collection... (4100 times) [2024-03-29 12:45:42,616][00501] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-03-29 12:45:42,778][00481] Signal inference workers to resume experience collection... (4100 times) [2024-03-29 12:45:42,779][00501] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-03-29 12:45:46,596][00501] Updated weights for policy 0, policy_version 6740 (0.0021) [2024-03-29 12:45:46,685][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 110428160. Throughput: 0: 42033.7. Samples: 110714580. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 12:45:46,686][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 12:45:50,623][00501] Updated weights for policy 0, policy_version 6750 (0.0026) [2024-03-29 12:45:51,685][00126] Fps is (10 sec: 40959.1, 60 sec: 41779.1, 300 sec: 41987.4). Total num frames: 110624768. Throughput: 0: 41097.8. Samples: 110958260. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 12:45:51,687][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 12:45:54,870][00501] Updated weights for policy 0, policy_version 6760 (0.0020) [2024-03-29 12:45:56,685][00126] Fps is (10 sec: 37683.8, 60 sec: 40961.5, 300 sec: 41709.8). Total num frames: 110804992. Throughput: 0: 41648.1. Samples: 111095320. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 12:45:56,686][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 12:45:58,393][00501] Updated weights for policy 0, policy_version 6770 (0.0026) [2024-03-29 12:46:01,685][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 111034368. Throughput: 0: 41725.3. Samples: 111327700. Policy #0 lag: (min: 0.0, avg: 18.9, max: 42.0) [2024-03-29 12:46:01,686][00126] Avg episode reward: [(0, '0.243')] [2024-03-29 12:46:02,441][00501] Updated weights for policy 0, policy_version 6780 (0.0024) [2024-03-29 12:46:06,331][00501] Updated weights for policy 0, policy_version 6790 (0.0019) [2024-03-29 12:46:06,685][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 111263744. Throughput: 0: 41012.0. Samples: 111570200. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 12:46:06,686][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 12:46:10,799][00501] Updated weights for policy 0, policy_version 6800 (0.0021) [2024-03-29 12:46:11,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 111443968. Throughput: 0: 41629.0. Samples: 111723200. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 12:46:11,686][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 12:46:14,353][00501] Updated weights for policy 0, policy_version 6810 (0.0022) [2024-03-29 12:46:16,144][00481] Signal inference workers to stop experience collection... (4150 times) [2024-03-29 12:46:16,199][00501] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-03-29 12:46:16,242][00481] Signal inference workers to resume experience collection... (4150 times) [2024-03-29 12:46:16,245][00501] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-03-29 12:46:16,685][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 111656960. Throughput: 0: 41122.8. Samples: 111936580. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:46:16,686][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:46:18,299][00501] Updated weights for policy 0, policy_version 6820 (0.0027) [2024-03-29 12:46:21,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 111853568. Throughput: 0: 41329.4. Samples: 112200060. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 12:46:21,686][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 12:46:22,338][00501] Updated weights for policy 0, policy_version 6830 (0.0027) [2024-03-29 12:46:26,685][00126] Fps is (10 sec: 39321.2, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 112050176. Throughput: 0: 41514.1. Samples: 112328800. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 12:46:26,687][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 12:46:26,782][00501] Updated weights for policy 0, policy_version 6840 (0.0031) [2024-03-29 12:46:30,297][00501] Updated weights for policy 0, policy_version 6850 (0.0027) [2024-03-29 12:46:31,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 112279552. Throughput: 0: 40793.9. Samples: 112550300. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 12:46:31,686][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 12:46:34,303][00501] Updated weights for policy 0, policy_version 6860 (0.0028) [2024-03-29 12:46:36,685][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 112492544. Throughput: 0: 40864.5. Samples: 112797160. Policy #0 lag: (min: 2.0, avg: 23.3, max: 42.0) [2024-03-29 12:46:36,688][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 12:46:38,534][00501] Updated weights for policy 0, policy_version 6870 (0.0023) [2024-03-29 12:46:41,685][00126] Fps is (10 sec: 37683.3, 60 sec: 40686.9, 300 sec: 41598.7). Total num frames: 112656384. Throughput: 0: 40898.2. Samples: 112935740. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 12:46:41,686][00126] Avg episode reward: [(0, '0.273')] [2024-03-29 12:46:42,699][00501] Updated weights for policy 0, policy_version 6880 (0.0026) [2024-03-29 12:46:46,183][00501] Updated weights for policy 0, policy_version 6890 (0.0019) [2024-03-29 12:46:46,685][00126] Fps is (10 sec: 40959.7, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 112902144. Throughput: 0: 41324.4. Samples: 113187300. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 12:46:46,686][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 12:46:49,274][00481] Signal inference workers to stop experience collection... (4200 times) [2024-03-29 12:46:49,332][00501] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-03-29 12:46:49,364][00481] Signal inference workers to resume experience collection... (4200 times) [2024-03-29 12:46:49,368][00501] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-03-29 12:46:50,182][00501] Updated weights for policy 0, policy_version 6900 (0.0025) [2024-03-29 12:46:51,685][00126] Fps is (10 sec: 45874.9, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 113115136. Throughput: 0: 41140.0. Samples: 113421500. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 12:46:51,686][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 12:46:52,178][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006906_113147904.pth... [2024-03-29 12:46:52,518][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006295_103137280.pth [2024-03-29 12:46:54,469][00501] Updated weights for policy 0, policy_version 6910 (0.0027) [2024-03-29 12:46:56,685][00126] Fps is (10 sec: 36044.8, 60 sec: 40959.9, 300 sec: 41543.2). Total num frames: 113262592. Throughput: 0: 40658.2. Samples: 113552820. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:46:56,686][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 12:46:58,591][00501] Updated weights for policy 0, policy_version 6920 (0.0028) [2024-03-29 12:47:01,685][00126] Fps is (10 sec: 37683.5, 60 sec: 40960.1, 300 sec: 41487.6). Total num frames: 113491968. Throughput: 0: 41593.8. Samples: 113808300. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:47:01,686][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 12:47:02,218][00501] Updated weights for policy 0, policy_version 6930 (0.0030) [2024-03-29 12:47:06,280][00501] Updated weights for policy 0, policy_version 6940 (0.0028) [2024-03-29 12:47:06,685][00126] Fps is (10 sec: 45875.1, 60 sec: 40959.9, 300 sec: 41598.7). Total num frames: 113721344. Throughput: 0: 40714.6. Samples: 114032220. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 12:47:06,686][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 12:47:10,214][00501] Updated weights for policy 0, policy_version 6950 (0.0021) [2024-03-29 12:47:11,686][00126] Fps is (10 sec: 40959.3, 60 sec: 40959.9, 300 sec: 41543.1). Total num frames: 113901568. Throughput: 0: 40908.4. Samples: 114169680. Policy #0 lag: (min: 0.0, avg: 22.6, max: 42.0) [2024-03-29 12:47:11,688][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 12:47:14,407][00501] Updated weights for policy 0, policy_version 6960 (0.0032) [2024-03-29 12:47:16,685][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 114114560. Throughput: 0: 41639.1. Samples: 114424060. Policy #0 lag: (min: 2.0, avg: 18.9, max: 42.0) [2024-03-29 12:47:16,686][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 12:47:17,111][00481] Signal inference workers to stop experience collection... (4250 times) [2024-03-29 12:47:17,184][00501] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-03-29 12:47:17,190][00481] Signal inference workers to resume experience collection... (4250 times) [2024-03-29 12:47:17,211][00501] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-03-29 12:47:18,195][00501] Updated weights for policy 0, policy_version 6970 (0.0023) [2024-03-29 12:47:21,685][00126] Fps is (10 sec: 42598.8, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 114327552. Throughput: 0: 41352.4. Samples: 114658020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 12:47:21,686][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 12:47:22,163][00501] Updated weights for policy 0, policy_version 6980 (0.0020) [2024-03-29 12:47:26,210][00501] Updated weights for policy 0, policy_version 6990 (0.0019) [2024-03-29 12:47:26,685][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 114524160. Throughput: 0: 41006.1. Samples: 114781020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 12:47:26,686][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 12:47:30,414][00501] Updated weights for policy 0, policy_version 7000 (0.0024) [2024-03-29 12:47:31,686][00126] Fps is (10 sec: 40959.5, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 114737152. Throughput: 0: 40939.0. Samples: 115029560. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 12:47:31,686][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 12:47:34,065][00501] Updated weights for policy 0, policy_version 7010 (0.0024) [2024-03-29 12:47:36,685][00126] Fps is (10 sec: 40960.0, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 114933760. Throughput: 0: 41256.8. Samples: 115278060. Policy #0 lag: (min: 1.0, avg: 20.0, max: 42.0) [2024-03-29 12:47:36,686][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 12:47:38,072][00501] Updated weights for policy 0, policy_version 7020 (0.0022) [2024-03-29 12:47:41,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.0, 300 sec: 41543.1). Total num frames: 115146752. Throughput: 0: 40964.0. Samples: 115396200. Policy #0 lag: (min: 0.0, avg: 22.7, max: 43.0) [2024-03-29 12:47:41,686][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 12:47:42,075][00501] Updated weights for policy 0, policy_version 7030 (0.0033) [2024-03-29 12:47:46,130][00481] Signal inference workers to stop experience collection... (4300 times) [2024-03-29 12:47:46,168][00501] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-03-29 12:47:46,355][00481] Signal inference workers to resume experience collection... (4300 times) [2024-03-29 12:47:46,356][00501] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-03-29 12:47:46,359][00501] Updated weights for policy 0, policy_version 7040 (0.0027) [2024-03-29 12:47:46,685][00126] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 115359744. Throughput: 0: 41327.0. Samples: 115668020. Policy #0 lag: (min: 2.0, avg: 19.1, max: 43.0) [2024-03-29 12:47:46,686][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 12:47:49,886][00501] Updated weights for policy 0, policy_version 7050 (0.0034) [2024-03-29 12:47:51,685][00126] Fps is (10 sec: 42598.7, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 115572736. Throughput: 0: 41250.7. Samples: 115888500. Policy #0 lag: (min: 2.0, avg: 19.1, max: 43.0) [2024-03-29 12:47:51,686][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 12:47:54,025][00501] Updated weights for policy 0, policy_version 7060 (0.0024) [2024-03-29 12:47:56,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 115769344. Throughput: 0: 40888.6. Samples: 116009660. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 12:47:56,686][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 12:47:58,154][00501] Updated weights for policy 0, policy_version 7070 (0.0033) [2024-03-29 12:48:01,685][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41321.0). Total num frames: 115949568. Throughput: 0: 41446.3. Samples: 116289140. Policy #0 lag: (min: 1.0, avg: 20.0, max: 43.0) [2024-03-29 12:48:01,686][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 12:48:02,298][00501] Updated weights for policy 0, policy_version 7080 (0.0033) [2024-03-29 12:48:05,821][00501] Updated weights for policy 0, policy_version 7090 (0.0026) [2024-03-29 12:48:06,685][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 116211712. Throughput: 0: 41049.8. Samples: 116505260. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:48:06,686][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 12:48:09,624][00501] Updated weights for policy 0, policy_version 7100 (0.0027) [2024-03-29 12:48:11,686][00126] Fps is (10 sec: 45874.4, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 116408320. Throughput: 0: 41156.8. Samples: 116633080. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 12:48:11,686][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 12:48:13,899][00501] Updated weights for policy 0, policy_version 7110 (0.0029) [2024-03-29 12:48:16,685][00126] Fps is (10 sec: 34406.5, 60 sec: 40687.0, 300 sec: 41265.5). Total num frames: 116555776. Throughput: 0: 41631.7. Samples: 116902980. Policy #0 lag: (min: 0.0, avg: 22.2, max: 43.0) [2024-03-29 12:48:16,686][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:48:18,120][00501] Updated weights for policy 0, policy_version 7120 (0.0031) [2024-03-29 12:48:20,831][00481] Signal inference workers to stop experience collection... (4350 times) [2024-03-29 12:48:20,865][00501] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-03-29 12:48:21,016][00481] Signal inference workers to resume experience collection... (4350 times) [2024-03-29 12:48:21,016][00501] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-03-29 12:48:21,598][00501] Updated weights for policy 0, policy_version 7130 (0.0025) [2024-03-29 12:48:21,685][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 116817920. Throughput: 0: 41298.2. Samples: 117136480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 12:48:21,686][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 12:48:25,449][00501] Updated weights for policy 0, policy_version 7140 (0.0019) [2024-03-29 12:48:26,685][00126] Fps is (10 sec: 47513.6, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 117030912. Throughput: 0: 41693.9. Samples: 117272420. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:48:26,686][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 12:48:29,561][00501] Updated weights for policy 0, policy_version 7150 (0.0027) [2024-03-29 12:48:31,686][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 41321.0). Total num frames: 117211136. Throughput: 0: 41568.0. Samples: 117538580. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 12:48:31,687][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 12:48:33,560][00501] Updated weights for policy 0, policy_version 7160 (0.0028) [2024-03-29 12:48:36,685][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.3, 300 sec: 41321.0). Total num frames: 117440512. Throughput: 0: 41921.4. Samples: 117774960. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 12:48:36,686][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 12:48:37,026][00501] Updated weights for policy 0, policy_version 7170 (0.0024) [2024-03-29 12:48:40,966][00501] Updated weights for policy 0, policy_version 7180 (0.0022) [2024-03-29 12:48:41,685][00126] Fps is (10 sec: 45876.0, 60 sec: 42052.4, 300 sec: 41598.7). Total num frames: 117669888. Throughput: 0: 42159.6. Samples: 117906840. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 12:48:41,686][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 12:48:45,347][00501] Updated weights for policy 0, policy_version 7190 (0.0030) [2024-03-29 12:48:46,685][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41376.6). Total num frames: 117850112. Throughput: 0: 41612.9. Samples: 118161720. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:48:46,686][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 12:48:49,383][00501] Updated weights for policy 0, policy_version 7200 (0.0030) [2024-03-29 12:48:51,478][00481] Signal inference workers to stop experience collection... (4400 times) [2024-03-29 12:48:51,479][00481] Signal inference workers to resume experience collection... (4400 times) [2024-03-29 12:48:51,516][00501] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-03-29 12:48:51,516][00501] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-03-29 12:48:51,685][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41321.0). Total num frames: 118063104. Throughput: 0: 42212.0. Samples: 118404800. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 12:48:51,686][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 12:48:51,781][00481] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007207_118079488.pth... [2024-03-29 12:48:52,129][00481] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006600_108134400.pth [2024-03-29 12:48:52,902][00501] Updated weights for policy 0, policy_version 7210 (0.0024) [2024-03-29 12:48:56,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41432.1). Total num frames: 118276096. Throughput: 0: 41720.2. Samples: 118510480. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 12:48:56,688][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 12:48:56,876][00501] Updated weights for policy 0, policy_version 7220 (0.0029) [2024-03-29 12:49:01,274][00501] Updated weights for policy 0, policy_version 7230 (0.0021) [2024-03-29 12:49:01,685][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41376.5). Total num frames: 118456320. Throughput: 0: 41520.9. Samples: 118771420. Policy #0 lag: (min: 0.0, avg: 22.6, max: 42.0) [2024-03-29 12:49:01,686][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 12:49:05,370][00501] Updated weights for policy 0, policy_version 7240 (0.0022) [2024-03-29 12:49:06,685][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41376.6). Total num frames: 118685696. Throughput: 0: 41855.6. Samples: 119019980. Policy #0 lag: (min: 1.0, avg: 19.9, max: 44.0) [2024-03-29 12:49:06,686][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 12:49:08,627][00501] Updated weights for policy 0, policy_version 7250 (0.0019) [2024-03-29 12:49:11,685][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41321.0). Total num frames: 118882304. Throughput: 0: 41449.7. Samples: 119137660. Policy #0 lag: (min: 1.0, avg: 19.9, max: 44.0) [2024-03-29 12:49:11,686][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 12:49:12,624][00501] Updated weights for policy 0, policy_version 7260 (0.0019) [2024-03-29 12:49:16,685][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.2, 300 sec: 41432.1). Total num frames: 119095296. Throughput: 0: 41324.5. Samples: 119398180. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 12:49:16,686][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 12:49:16,743][00501] Updated weights for policy 0, policy_version 7270 (0.0023) [2024-03-29 12:49:20,771][00501] Updated weights for policy 0, policy_version 7280 (0.0023) [2024-03-29 12:49:21,685][00126] Fps is (10 sec: 44237.0, 60 sec: 41779.2, 300 sec: 41487.6). Total num frames: 119324672. Throughput: 0: 41782.2. Samples: 119655160. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 12:49:21,686][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 12:49:23,985][00501] Updated weights for policy 0, policy_version 7290 (0.0021) [2024-03-29 12:49:26,685][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41487.6). Total num frames: 119521280. Throughput: 0: 41543.0. Samples: 119776280. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 12:49:26,688][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 12:49:26,849][00481] Signal inference workers to stop experience collection... (4450 times) [2024-03-29 12:49:26,874][00501] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-03-29 12:49:27,050][00481] Signal inference workers to resume experience collection... (4450 times) [2024-03-29 12:49:27,051][00501] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-03-29 12:49:28,062][00501] Updated weights for policy 0, policy_version 7300 (0.0019) [2024-03-29 12:57:08,281][00126] Saving configuration to /workspace/metta/train_dir/b.a20.20x20_40x40.norm/config.json... [2024-03-29 12:57:08,418][00126] Rollout worker 0 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 1 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 2 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 3 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 4 uses device cpu [2024-03-29 12:57:08,419][00126] Rollout worker 5 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 6 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 7 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 8 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 9 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 10 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 11 uses device cpu [2024-03-29 12:57:08,420][00126] Rollout worker 12 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 13 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 14 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 15 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 16 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 17 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 18 uses device cpu [2024-03-29 12:57:08,421][00126] Rollout worker 19 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 20 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 21 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 22 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 23 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 24 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 25 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 26 uses device cpu [2024-03-29 12:57:08,422][00126] Rollout worker 27 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 28 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 29 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 30 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 31 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 32 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 33 uses device cpu [2024-03-29 12:57:08,423][00126] Rollout worker 34 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 35 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 36 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 37 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 38 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 39 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 40 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 41 uses device cpu [2024-03-29 12:57:08,424][00126] Rollout worker 42 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 43 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 44 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 45 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 46 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 47 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 48 uses device cpu [2024-03-29 12:57:08,425][00126] Rollout worker 49 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 50 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 51 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 52 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 53 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 54 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 55 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 56 uses device cpu [2024-03-29 12:57:08,426][00126] Rollout worker 57 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 58 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 59 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 60 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 61 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 62 uses device cpu [2024-03-29 12:57:08,427][00126] Rollout worker 63 uses device cpu [2024-03-29 12:57:10,147][00126] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:10,148][00126] InferenceWorker_p0-w0: min num requests: 21 [2024-03-29 12:57:10,252][00126] Starting all processes... [2024-03-29 12:57:10,252][00126] Starting process learner_proc0 [2024-03-29 12:57:10,457][00126] Starting all processes... [2024-03-29 12:57:10,462][00126] Starting process inference_proc0-0 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc1 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc3 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc5 [2024-03-29 12:57:10,463][00126] Starting process rollout_proc7 [2024-03-29 12:57:10,464][00126] Starting process rollout_proc9 [2024-03-29 12:57:10,467][00126] Starting process rollout_proc11 [2024-03-29 12:57:10,477][00126] Starting process rollout_proc13 [2024-03-29 12:57:10,477][00126] Starting process rollout_proc0 [2024-03-29 12:57:10,477][00126] Starting process rollout_proc2 [2024-03-29 12:57:10,479][00126] Starting process rollout_proc15 [2024-03-29 12:57:10,479][00126] Starting process rollout_proc17 [2024-03-29 12:57:10,487][00126] Starting process rollout_proc19 [2024-03-29 12:57:10,487][00126] Starting process rollout_proc21 [2024-03-29 12:57:10,489][00126] Starting process rollout_proc23 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc25 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc4 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc6 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc27 [2024-03-29 12:57:10,490][00126] Starting process rollout_proc29 [2024-03-29 12:57:10,504][00126] Starting process rollout_proc8 [2024-03-29 12:57:10,514][00126] Starting process rollout_proc10 [2024-03-29 12:57:10,546][00126] Starting process rollout_proc12 [2024-03-29 12:57:10,560][00126] Starting process rollout_proc14 [2024-03-29 12:57:10,675][00126] Starting process rollout_proc31 [2024-03-29 12:57:10,675][00126] Starting process rollout_proc18 [2024-03-29 12:57:10,681][00126] Starting process rollout_proc16 [2024-03-29 12:57:10,681][00126] Starting process rollout_proc33 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc22 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc20 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc26 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc24 [2024-03-29 12:57:10,738][00126] Starting process rollout_proc35 [2024-03-29 12:57:10,770][00126] Starting process rollout_proc30 [2024-03-29 12:57:10,790][00126] Starting process rollout_proc28 [2024-03-29 12:57:10,802][00126] Starting process rollout_proc37 [2024-03-29 12:57:10,879][00126] Starting process rollout_proc39 [2024-03-29 12:57:10,879][00126] Starting process rollout_proc41 [2024-03-29 12:57:10,879][00126] Starting process rollout_proc43 [2024-03-29 12:57:10,902][00126] Starting process rollout_proc45 [2024-03-29 12:57:10,902][00126] Starting process rollout_proc32 [2024-03-29 12:57:10,902][00126] Starting process rollout_proc47 [2024-03-29 12:57:10,926][00126] Starting process rollout_proc49 [2024-03-29 12:57:10,948][00126] Starting process rollout_proc34 [2024-03-29 12:57:10,967][00126] Starting process rollout_proc51 [2024-03-29 12:57:10,988][00126] Starting process rollout_proc53 [2024-03-29 12:57:11,013][00126] Starting process rollout_proc55 [2024-03-29 12:57:11,030][00126] Starting process rollout_proc57 [2024-03-29 12:57:11,053][00126] Starting process rollout_proc36 [2024-03-29 12:57:11,090][00126] Starting process rollout_proc59 [2024-03-29 12:57:11,112][00126] Starting process rollout_proc61 [2024-03-29 12:57:11,135][00126] Starting process rollout_proc38 [2024-03-29 12:57:11,166][00126] Starting process rollout_proc40 [2024-03-29 12:57:11,193][00126] Starting process rollout_proc42 [2024-03-29 12:57:11,318][00126] Starting process rollout_proc44 [2024-03-29 12:57:11,344][00126] Starting process rollout_proc62 [2024-03-29 12:57:11,403][00126] Starting process rollout_proc63 [2024-03-29 12:57:11,480][00126] Starting process rollout_proc50 [2024-03-29 12:57:11,508][00126] Starting process rollout_proc46 [2024-03-29 12:57:11,521][00126] Starting process rollout_proc60 [2024-03-29 12:57:11,558][00126] Starting process rollout_proc56 [2024-03-29 12:57:11,614][00126] Starting process rollout_proc52 [2024-03-29 12:57:11,615][00126] Starting process rollout_proc54 [2024-03-29 12:57:11,615][00126] Starting process rollout_proc58 [2024-03-29 12:57:11,615][00126] Starting process rollout_proc48 [2024-03-29 12:57:15,509][00500] Worker 7 uses CPU cores [7] [2024-03-29 12:57:15,553][00476] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:15,553][00476] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-03-29 12:57:15,575][00476] Num visible devices: 1 [2024-03-29 12:57:15,593][00499] Worker 5 uses CPU cores [5] [2024-03-29 12:57:15,638][00476] Starting seed is not provided [2024-03-29 12:57:15,638][00476] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:15,638][00476] Initializing actor-critic model on device cuda:0 [2024-03-29 12:57:15,639][00476] RunningMeanStd input shape: (20,) [2024-03-29 12:57:15,639][00476] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:57:15,640][00476] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:57:15,640][00476] RunningMeanStd input shape: (2,) [2024-03-29 12:57:15,640][00476] RunningMeanStd input shape: (1,) [2024-03-29 12:57:15,641][00476] RunningMeanStd input shape: (1,) [2024-03-29 12:57:15,648][00675] Worker 11 uses CPU cores [11] [2024-03-29 12:57:15,708][01076] Worker 25 uses CPU cores [25] [2024-03-29 12:57:15,709][01350] Worker 12 uses CPU cores [12] [2024-03-29 12:57:15,728][00756] Worker 2 uses CPU cores [2] [2024-03-29 12:57:15,729][00883] Worker 15 uses CPU cores [15] [2024-03-29 12:57:15,729][00496] Worker 1 uses CPU cores [1] [2024-03-29 12:57:15,741][00498] Worker 3 uses CPU cores [3] [2024-03-29 12:57:15,741][01656] Worker 16 uses CPU cores [16] [2024-03-29 12:57:15,741][00565] Worker 13 uses CPU cores [13] [2024-03-29 12:57:15,773][01141] Worker 23 uses CPU cores [23] [2024-03-29 12:57:15,773][01785] Worker 20 uses CPU cores [20] [2024-03-29 12:57:15,773][00564] Worker 9 uses CPU cores [9] [2024-03-29 12:57:15,785][00947] Worker 17 uses CPU cores [17] [2024-03-29 12:57:15,797][01786] Worker 26 uses CPU cores [26] [2024-03-29 12:57:15,816][01431] Worker 14 uses CPU cores [14] [2024-03-29 12:57:15,816][01142] Worker 19 uses CPU cores [19] [2024-03-29 12:57:15,817][00948] Worker 0 uses CPU cores [0] [2024-03-29 12:57:15,825][00949] Worker 21 uses CPU cores [21] [2024-03-29 12:57:15,833][01720] Worker 33 uses CPU cores [33] [2024-03-29 12:57:15,839][01844] Worker 24 uses CPU cores [24] [2024-03-29 12:57:15,841][01182] Worker 29 uses CPU cores [29] [2024-03-29 12:57:15,863][00497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:15,863][00497] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-03-29 12:57:15,877][02105] Worker 28 uses CPU cores [28] [2024-03-29 12:57:15,893][01328] Worker 8 uses CPU cores [8] [2024-03-29 12:57:15,893][00497] Num visible devices: 1 [2024-03-29 12:57:15,897][01207] Worker 27 uses CPU cores [27] [2024-03-29 12:57:15,913][01915] Worker 30 uses CPU cores [30] [2024-03-29 12:57:15,914][01902] Worker 35 uses CPU cores [35] [2024-03-29 12:57:15,929][02314] Worker 41 uses CPU cores [41] [2024-03-29 12:57:15,958][02169] Worker 37 uses CPU cores [37] [2024-03-29 12:57:15,973][03133] Worker 53 uses CPU cores [53] [2024-03-29 12:57:15,977][01465] Worker 31 uses CPU cores [31] [2024-03-29 12:57:15,992][01077] Worker 4 uses CPU cores [4] [2024-03-29 12:57:15,993][01503] Worker 18 uses CPU cores [18] [2024-03-29 12:57:15,993][03197] Worker 32 uses CPU cores [32] [2024-03-29 12:57:16,001][01721] Worker 22 uses CPU cores [22] [2024-03-29 12:57:16,001][03064] Worker 63 uses CPU cores [63] [2024-03-29 12:57:16,009][02678] Worker 61 uses CPU cores [61] [2024-03-29 12:57:16,017][01336] Worker 10 uses CPU cores [10] [2024-03-29 12:57:16,020][03066] Worker 47 uses CPU cores [47] [2024-03-29 12:57:16,036][03643] Worker 60 uses CPU cores [60] [2024-03-29 12:57:16,037][03068] Worker 51 uses CPU cores [51] [2024-03-29 12:57:16,037][02679] Worker 40 uses CPU cores [40] [2024-03-29 12:57:16,037][02614] Worker 59 uses CPU cores [59] [2024-03-29 12:57:16,037][02747] Worker 62 uses CPU cores [62] [2024-03-29 12:57:16,059][03132] Worker 55 uses CPU cores [55] [2024-03-29 12:57:16,059][03898] Worker 58 uses CPU cores [58] [2024-03-29 12:57:16,059][02724] Worker 36 uses CPU cores [36] [2024-03-29 12:57:16,075][03065] Worker 34 uses CPU cores [34] [2024-03-29 12:57:16,089][03770] Worker 52 uses CPU cores [52] [2024-03-29 12:57:16,100][03962] Worker 48 uses CPU cores [48] [2024-03-29 12:57:16,101][03388] Worker 57 uses CPU cores [57] [2024-03-29 12:57:16,101][03063] Worker 49 uses CPU cores [49] [2024-03-29 12:57:16,126][02288] Worker 39 uses CPU cores [39] [2024-03-29 12:57:16,129][02680] Worker 44 uses CPU cores [44] [2024-03-29 12:57:16,205][02681] Worker 38 uses CPU cores [38] [2024-03-29 12:57:16,242][03452] Worker 46 uses CPU cores [46] [2024-03-29 12:57:16,243][03324] Worker 50 uses CPU cores [50] [2024-03-29 12:57:16,245][02550] Worker 43 uses CPU cores [43] [2024-03-29 12:57:16,289][03067] Worker 45 uses CPU cores [45] [2024-03-29 12:57:16,301][01271] Worker 6 uses CPU cores [6] [2024-03-29 12:57:16,309][02682] Worker 42 uses CPU cores [42] [2024-03-29 12:57:16,344][03769] Worker 56 uses CPU cores [56] [2024-03-29 12:57:16,353][03897] Worker 54 uses CPU cores [54] [2024-03-29 12:57:16,419][00476] Created Actor Critic model with architecture: [2024-03-29 12:57:16,420][00476] PredictingActorCritic( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (global_vars): RunningMeanStdInPlace() (griddly_obs): RunningMeanStdInPlace() (kinship): RunningMeanStdInPlace() (last_action): RunningMeanStdInPlace() (last_reward): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): ObjectEmeddingAgentEncoder( (object_embedding): Sequential( (0): Linear(in_features=52, out_features=64, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=64, out_features=64, bias=True) (1): ELU(alpha=1.0) ) ) (encoder_head): Sequential( (0): Linear(in_features=7767, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (3): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) (4): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): ObjectEmeddingAgentDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=17, bias=True) ) ) [2024-03-29 12:57:17,232][00476] Using optimizer [2024-03-29 12:57:17,741][00476] Loading state from checkpoint /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007207_118079488.pth... [2024-03-29 12:57:18,040][00476] Loading model from checkpoint [2024-03-29 12:57:18,042][00476] Loaded experiment state at self.train_step=7207, self.env_steps=118079488 [2024-03-29 12:57:18,042][00476] Initialized policy 0 weights for model version 7207 [2024-03-29 12:57:18,044][00476] LearnerWorker_p0 finished initialization! [2024-03-29 12:57:18,044][00476] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-03-29 12:57:18,227][00497] RunningMeanStd input shape: (20,) [2024-03-29 12:57:18,227][00497] RunningMeanStd input shape: (23, 11, 11) [2024-03-29 12:57:18,227][00497] RunningMeanStd input shape: (1, 11, 11) [2024-03-29 12:57:18,228][00497] RunningMeanStd input shape: (2,) [2024-03-29 12:57:18,228][00497] RunningMeanStd input shape: (1,) [2024-03-29 12:57:18,228][00497] RunningMeanStd input shape: (1,) [2024-03-29 12:57:18,839][00126] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 118079488. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:18,871][00126] Inference worker 0-0 is ready! [2024-03-29 12:57:18,872][00126] All inference workers are ready! Signal rollout workers to start! [2024-03-29 12:57:19,816][01141] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,818][03770] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,821][00947] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,831][03197] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,832][01328] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,836][00756] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,841][01465] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,854][00883] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,856][02681] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,856][01431] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,856][01350] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,857][01844] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,857][03068] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,855][01207] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,863][01785] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,864][02614] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,865][03962] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,866][01721] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,866][03066] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,867][02105] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,868][02169] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,874][01077] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,876][00496] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,876][00499] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,878][02288] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,883][02724] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,886][02314] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,889][02550] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,890][01182] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,891][03897] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,900][01271] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,901][01720] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,905][03133] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,911][00949] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,912][03452] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,912][00675] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,914][01142] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,915][00500] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,917][02682] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,921][01336] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,922][01915] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,927][00498] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,927][03388] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,934][01076] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,934][03064] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,940][03065] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,942][03067] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,943][03132] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,945][00948] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,948][03769] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,952][02747] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,953][03643] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,957][00565] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,958][03324] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,965][01656] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,966][01902] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,966][02678] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,968][02680] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,973][01786] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,975][03063] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,980][03898] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,981][00564] Decorrelating experience for 0 frames... [2024-03-29 12:57:19,994][02679] Decorrelating experience for 0 frames... [2024-03-29 12:57:20,000][01503] Decorrelating experience for 0 frames... [2024-03-29 12:57:20,750][01141] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,756][03770] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,758][01328] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,764][01844] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,771][02614] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,772][01077] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,774][01465] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,780][01350] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,789][00947] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,790][03068] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,793][03197] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,801][00883] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,803][01721] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,805][02169] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,810][00756] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,810][02105] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,811][03066] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,813][01271] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,819][00500] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,820][01785] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,823][00949] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,825][03132] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,825][03388] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,826][01207] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,826][01182] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,829][02724] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,836][02681] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,837][01142] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,839][00948] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,839][03067] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,841][03452] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,842][03962] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,846][00496] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,851][03064] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,852][00498] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,857][03897] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,857][01431] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,861][03065] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,862][00499] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,863][02314] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,870][02550] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,872][03643] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,873][00565] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,875][01786] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,878][02682] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,880][01902] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,880][03133] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,885][03769] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,887][01336] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,894][00675] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,894][02678] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,904][02288] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,906][01720] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,907][01915] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,915][01656] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,919][03898] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,919][02679] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,921][01076] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,923][02747] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,925][03063] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,926][02680] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,932][00564] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,938][03324] Decorrelating experience for 256 frames... [2024-03-29 12:57:20,992][01503] Decorrelating experience for 256 frames... [2024-03-29 12:57:23,839][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 118079488. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:28,839][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 118079488. Throughput: 0: 33488.4. Samples: 334880. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:30,144][00126] Heartbeat connected on Batcher_0 [2024-03-29 12:57:30,145][00126] Heartbeat connected on LearnerWorker_p0 [2024-03-29 12:57:30,150][00126] Heartbeat connected on RolloutWorker_w0 [2024-03-29 12:57:30,152][00126] Heartbeat connected on RolloutWorker_w1 [2024-03-29 12:57:30,154][00126] Heartbeat connected on RolloutWorker_w2 [2024-03-29 12:57:30,159][00126] Heartbeat connected on RolloutWorker_w5 [2024-03-29 12:57:30,160][00126] Heartbeat connected on RolloutWorker_w6 [2024-03-29 12:57:30,161][00126] Heartbeat connected on RolloutWorker_w3 [2024-03-29 12:57:30,166][00126] Heartbeat connected on RolloutWorker_w4 [2024-03-29 12:57:30,166][00126] Heartbeat connected on RolloutWorker_w10 [2024-03-29 12:57:30,169][00126] Heartbeat connected on RolloutWorker_w12 [2024-03-29 12:57:30,170][00126] Heartbeat connected on RolloutWorker_w11 [2024-03-29 12:57:30,170][00126] Heartbeat connected on RolloutWorker_w9 [2024-03-29 12:57:30,170][00126] Heartbeat connected on RolloutWorker_w8 [2024-03-29 12:57:30,171][00126] Heartbeat connected on RolloutWorker_w13 [2024-03-29 12:57:30,171][00126] Heartbeat connected on RolloutWorker_w7 [2024-03-29 12:57:30,174][00126] Heartbeat connected on RolloutWorker_w15 [2024-03-29 12:57:30,177][00126] Heartbeat connected on InferenceWorker_p0-w0 [2024-03-29 12:57:30,179][00126] Heartbeat connected on RolloutWorker_w18 [2024-03-29 12:57:30,181][00126] Heartbeat connected on RolloutWorker_w19 [2024-03-29 12:57:30,182][00126] Heartbeat connected on RolloutWorker_w17 [2024-03-29 12:57:30,182][00126] Heartbeat connected on RolloutWorker_w20 [2024-03-29 12:57:30,183][00126] Heartbeat connected on RolloutWorker_w16 [2024-03-29 12:57:30,185][00126] Heartbeat connected on RolloutWorker_w22 [2024-03-29 12:57:30,186][00126] Heartbeat connected on RolloutWorker_w21 [2024-03-29 12:57:30,187][00126] Heartbeat connected on RolloutWorker_w23 [2024-03-29 12:57:30,188][00126] Heartbeat connected on RolloutWorker_w14 [2024-03-29 12:57:30,188][00126] Heartbeat connected on RolloutWorker_w24 [2024-03-29 12:57:30,190][00126] Heartbeat connected on RolloutWorker_w25 [2024-03-29 12:57:30,191][00126] Heartbeat connected on RolloutWorker_w26 [2024-03-29 12:57:30,193][00126] Heartbeat connected on RolloutWorker_w27 [2024-03-29 12:57:30,195][00126] Heartbeat connected on RolloutWorker_w28 [2024-03-29 12:57:30,200][00126] Heartbeat connected on RolloutWorker_w31 [2024-03-29 12:57:30,201][00126] Heartbeat connected on RolloutWorker_w32 [2024-03-29 12:57:30,203][00126] Heartbeat connected on RolloutWorker_w29 [2024-03-29 12:57:30,203][00126] Heartbeat connected on RolloutWorker_w33 [2024-03-29 12:57:30,204][00126] Heartbeat connected on RolloutWorker_w34 [2024-03-29 12:57:30,205][00126] Heartbeat connected on RolloutWorker_w30 [2024-03-29 12:57:30,206][00126] Heartbeat connected on RolloutWorker_w35 [2024-03-29 12:57:30,207][00126] Heartbeat connected on RolloutWorker_w36 [2024-03-29 12:57:30,209][00126] Heartbeat connected on RolloutWorker_w37 [2024-03-29 12:57:30,210][00126] Heartbeat connected on RolloutWorker_w38 [2024-03-29 12:57:30,212][00126] Heartbeat connected on RolloutWorker_w39 [2024-03-29 12:57:30,213][00126] Heartbeat connected on RolloutWorker_w40 [2024-03-29 12:57:30,215][00126] Heartbeat connected on RolloutWorker_w41 [2024-03-29 12:57:30,218][00126] Heartbeat connected on RolloutWorker_w43 [2024-03-29 12:57:30,223][00126] Heartbeat connected on RolloutWorker_w42 [2024-03-29 12:57:30,224][00126] Heartbeat connected on RolloutWorker_w46 [2024-03-29 12:57:30,226][00126] Heartbeat connected on RolloutWorker_w48 [2024-03-29 12:57:30,226][00126] Heartbeat connected on RolloutWorker_w45 [2024-03-29 12:57:30,227][00126] Heartbeat connected on RolloutWorker_w49 [2024-03-29 12:57:30,229][00126] Heartbeat connected on RolloutWorker_w50 [2024-03-29 12:57:30,230][00126] Heartbeat connected on RolloutWorker_w51 [2024-03-29 12:57:30,232][00126] Heartbeat connected on RolloutWorker_w52 [2024-03-29 12:57:30,232][00126] Heartbeat connected on RolloutWorker_w44 [2024-03-29 12:57:30,233][00126] Heartbeat connected on RolloutWorker_w53 [2024-03-29 12:57:30,234][00126] Heartbeat connected on RolloutWorker_w47 [2024-03-29 12:57:30,235][00126] Heartbeat connected on RolloutWorker_w54 [2024-03-29 12:57:30,236][00126] Heartbeat connected on RolloutWorker_w55 [2024-03-29 12:57:30,238][00126] Heartbeat connected on RolloutWorker_w56 [2024-03-29 12:57:30,243][00126] Heartbeat connected on RolloutWorker_w58 [2024-03-29 12:57:30,244][00126] Heartbeat connected on RolloutWorker_w57 [2024-03-29 12:57:30,245][00126] Heartbeat connected on RolloutWorker_w59 [2024-03-29 12:57:30,246][00126] Heartbeat connected on RolloutWorker_w61 [2024-03-29 12:57:30,248][00126] Heartbeat connected on RolloutWorker_w62 [2024-03-29 12:57:30,248][00126] Heartbeat connected on RolloutWorker_w63 [2024-03-29 12:57:30,249][00126] Heartbeat connected on RolloutWorker_w60 [2024-03-29 12:57:32,883][03388] Worker 57, sleep for 133.594 sec to decorrelate experience collection [2024-03-29 12:57:32,883][00496] Worker 1, sleep for 2.344 sec to decorrelate experience collection [2024-03-29 12:57:32,897][03066] Worker 47, sleep for 110.156 sec to decorrelate experience collection [2024-03-29 12:57:32,897][01465] Worker 31, sleep for 72.656 sec to decorrelate experience collection [2024-03-29 12:57:32,898][03769] Worker 56, sleep for 131.250 sec to decorrelate experience collection [2024-03-29 12:57:32,898][00565] Worker 13, sleep for 30.469 sec to decorrelate experience collection [2024-03-29 12:57:32,898][01786] Worker 26, sleep for 60.938 sec to decorrelate experience collection [2024-03-29 12:57:32,903][02614] Worker 59, sleep for 138.281 sec to decorrelate experience collection [2024-03-29 12:57:32,917][01844] Worker 24, sleep for 56.250 sec to decorrelate experience collection [2024-03-29 12:57:32,917][03452] Worker 46, sleep for 107.812 sec to decorrelate experience collection [2024-03-29 12:57:32,920][01271] Worker 6, sleep for 14.062 sec to decorrelate experience collection [2024-03-29 12:57:32,920][00756] Worker 2, sleep for 4.688 sec to decorrelate experience collection [2024-03-29 12:57:32,921][03197] Worker 32, sleep for 75.000 sec to decorrelate experience collection [2024-03-29 12:57:32,921][02169] Worker 37, sleep for 86.719 sec to decorrelate experience collection [2024-03-29 12:57:32,921][03132] Worker 55, sleep for 128.906 sec to decorrelate experience collection [2024-03-29 12:57:32,923][03770] Worker 52, sleep for 121.875 sec to decorrelate experience collection [2024-03-29 12:57:32,926][02550] Worker 43, sleep for 100.781 sec to decorrelate experience collection [2024-03-29 12:57:32,938][03065] Worker 34, sleep for 79.688 sec to decorrelate experience collection [2024-03-29 12:57:32,938][00499] Worker 5, sleep for 11.719 sec to decorrelate experience collection [2024-03-29 12:57:32,938][01431] Worker 14, sleep for 32.812 sec to decorrelate experience collection [2024-03-29 12:57:32,939][00498] Worker 3, sleep for 7.031 sec to decorrelate experience collection [2024-03-29 12:57:32,940][01721] Worker 22, sleep for 51.562 sec to decorrelate experience collection [2024-03-29 12:57:32,940][00947] Worker 17, sleep for 39.844 sec to decorrelate experience collection [2024-03-29 12:57:32,941][01350] Worker 12, sleep for 28.125 sec to decorrelate experience collection [2024-03-29 12:57:32,953][02681] Worker 38, sleep for 89.062 sec to decorrelate experience collection [2024-03-29 12:57:32,958][02724] Worker 36, sleep for 84.375 sec to decorrelate experience collection [2024-03-29 12:57:32,959][01328] Worker 8, sleep for 18.750 sec to decorrelate experience collection [2024-03-29 12:57:32,966][00883] Worker 15, sleep for 35.156 sec to decorrelate experience collection [2024-03-29 12:57:32,968][03063] Worker 49, sleep for 114.844 sec to decorrelate experience collection [2024-03-29 12:57:32,969][03898] Worker 58, sleep for 135.938 sec to decorrelate experience collection [2024-03-29 12:57:32,983][01915] Worker 30, sleep for 70.312 sec to decorrelate experience collection [2024-03-29 12:57:32,983][00500] Worker 7, sleep for 16.406 sec to decorrelate experience collection [2024-03-29 12:57:32,994][03133] Worker 53, sleep for 124.219 sec to decorrelate experience collection [2024-03-29 12:57:33,003][02747] Worker 62, sleep for 145.312 sec to decorrelate experience collection [2024-03-29 12:57:33,007][03643] Worker 60, sleep for 140.625 sec to decorrelate experience collection [2024-03-29 12:57:33,013][02314] Worker 41, sleep for 96.094 sec to decorrelate experience collection [2024-03-29 12:57:33,014][03962] Worker 48, sleep for 112.500 sec to decorrelate experience collection [2024-03-29 12:57:33,022][00675] Worker 11, sleep for 25.781 sec to decorrelate experience collection [2024-03-29 12:57:33,026][02680] Worker 44, sleep for 103.125 sec to decorrelate experience collection [2024-03-29 12:57:33,032][00564] Worker 9, sleep for 21.094 sec to decorrelate experience collection [2024-03-29 12:57:33,033][02682] Worker 42, sleep for 98.438 sec to decorrelate experience collection [2024-03-29 12:57:33,036][01182] Worker 29, sleep for 67.969 sec to decorrelate experience collection [2024-03-29 12:57:33,038][01656] Worker 16, sleep for 37.500 sec to decorrelate experience collection [2024-03-29 12:57:33,047][01720] Worker 33, sleep for 77.344 sec to decorrelate experience collection [2024-03-29 12:57:33,065][01336] Worker 10, sleep for 23.438 sec to decorrelate experience collection [2024-03-29 12:57:33,070][03897] Worker 54, sleep for 126.562 sec to decorrelate experience collection [2024-03-29 12:57:33,074][00476] Signal inference workers to stop experience collection... [2024-03-29 12:57:33,079][01785] Worker 20, sleep for 46.875 sec to decorrelate experience collection [2024-03-29 12:57:33,079][01077] Worker 4, sleep for 9.375 sec to decorrelate experience collection [2024-03-29 12:57:33,093][01503] Worker 18, sleep for 42.188 sec to decorrelate experience collection [2024-03-29 12:57:33,098][00497] InferenceWorker_p0-w0: stopping experience collection [2024-03-29 12:57:33,103][01902] Worker 35, sleep for 82.031 sec to decorrelate experience collection [2024-03-29 12:57:33,103][02679] Worker 40, sleep for 93.750 sec to decorrelate experience collection [2024-03-29 12:57:33,104][03064] Worker 63, sleep for 147.656 sec to decorrelate experience collection [2024-03-29 12:57:33,110][01207] Worker 27, sleep for 63.281 sec to decorrelate experience collection [2024-03-29 12:57:33,110][01141] Worker 23, sleep for 53.906 sec to decorrelate experience collection [2024-03-29 12:57:33,122][03068] Worker 51, sleep for 119.531 sec to decorrelate experience collection [2024-03-29 12:57:33,122][03324] Worker 50, sleep for 117.188 sec to decorrelate experience collection [2024-03-29 12:57:33,839][00126] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 118079488. Throughput: 0: 34871.1. Samples: 523060. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-03-29 12:57:34,826][00476] Signal inference workers to resume experience collection... [2024-03-29 12:57:34,826][00497] InferenceWorker_p0-w0: resuming experience collection [2024-03-29 12:57:34,871][01076] Worker 25, sleep for 58.594 sec to decorrelate experience collection [2024-03-29 12:57:34,903][02678] Worker 61, sleep for 142.969 sec to decorrelate experience collection [2024-03-29 12:57:34,929][02105] Worker 28, sleep for 65.625 sec to decorrelate experience collection [2024-03-29 12:57:34,930][03067] Worker 45, sleep for 105.469 sec to decorrelate experience collection [2024-03-29 12:57:34,935][02288] Worker 39, sleep for 91.406 sec to decorrelate experience collection [2024-03-29 12:57:35,239][00496] Worker 1 awakens! [2024-03-29 12:57:35,376][00949] Worker 21, sleep for 49.219 sec to decorrelate experience collection [2024-03-29 12:57:35,410][01142] Worker 19, sleep for 44.531 sec to decorrelate experience collection [2024-03-29 12:57:37,190][00497] Updated weights for policy 0, policy_version 7217 (0.0014) [2024-03-29 12:57:37,631][00756] Worker 2 awakens! [2024-03-29 12:57:38,839][00126] Fps is (10 sec: 27853.0, 60 sec: 13926.5, 300 sec: 13926.5). Total num frames: 118358016. Throughput: 0: 32859.3. Samples: 657180. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-03-29 12:57:39,456][00497] Updated weights for policy 0, policy_version 7227 (0.0013) [2024-03-29 12:57:40,006][00498] Worker 3 awakens! [2024-03-29 12:57:42,502][01077] Worker 4 awakens! [2024-03-29 12:57:43,839][00126] Fps is (10 sec: 34406.3, 60 sec: 13762.7, 300 sec: 13762.7). Total num frames: 118423552. Throughput: 0: 27129.8. Samples: 678240. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-03-29 12:57:44,681][00499] Worker 5 awakens! [2024-03-29 12:57:47,053][01271] Worker 6 awakens! [2024-03-29 12:57:48,839][00126] Fps is (10 sec: 11468.7, 60 sec: 13107.2, 300 sec: 13107.2). Total num frames: 118472704. Throughput: 0: 23285.4. Samples: 698560. Policy #0 lag: (min: 0.0, avg: 18.7, max: 21.0) [2024-03-29 12:57:49,396][00500] Worker 7 awakens! [2024-03-29 12:57:51,733][01328] Worker 8 awakens! [2024-03-29 12:57:53,839][00126] Fps is (10 sec: 8192.1, 60 sec: 12171.1, 300 sec: 12171.1). Total num frames: 118505472. Throughput: 0: 21428.7. Samples: 750000. Policy #0 lag: (min: 0.0, avg: 9.3, max: 25.0) [2024-03-29 12:57:54,226][00564] Worker 9 awakens! [2024-03-29 12:57:56,603][01336] Worker 10 awakens! [2024-03-29 12:57:57,495][00497] Updated weights for policy 0, policy_version 7237 (0.0012) [2024-03-29 12:57:58,821][00675] Worker 11 awakens! [2024-03-29 12:57:58,839][00126] Fps is (10 sec: 9830.6, 60 sec: 12288.1, 300 sec: 12288.1). Total num frames: 118571008. Throughput: 0: 20585.1. Samples: 823400. Policy #0 lag: (min: 0.0, avg: 9.3, max: 25.0) [2024-03-29 12:57:58,840][00126] Avg episode reward: [(0, '0.233')] [2024-03-29 12:58:01,167][01350] Worker 12 awakens! [2024-03-29 12:58:03,468][00565] Worker 13 awakens! [2024-03-29 12:58:03,839][00126] Fps is (10 sec: 19660.8, 60 sec: 13835.5, 300 sec: 13835.5). Total num frames: 118702080. Throughput: 0: 19660.6. Samples: 884720. Policy #0 lag: (min: 0.0, avg: 3.2, max: 7.0) [2024-03-29 12:58:03,839][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 12:58:05,851][01431] Worker 14 awakens! [2024-03-29 12:58:06,143][00497] Updated weights for policy 0, policy_version 7247 (0.0013) [2024-03-29 12:58:08,223][00883] Worker 15 awakens! [2024-03-29 12:58:08,839][00126] Fps is (10 sec: 26214.2, 60 sec: 15073.3, 300 sec: 15073.3). Total num frames: 118833152. Throughput: 0: 22755.6. Samples: 1024000. Policy #0 lag: (min: 0.0, avg: 4.5, max: 9.0) [2024-03-29 12:58:08,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 12:58:10,638][01656] Worker 16 awakens! [2024-03-29 12:58:12,137][00497] Updated weights for policy 0, policy_version 7257 (0.0012) [2024-03-29 12:58:12,871][00947] Worker 17 awakens! [2024-03-29 12:58:13,839][00126] Fps is (10 sec: 26214.2, 60 sec: 16086.2, 300 sec: 16086.2). Total num frames: 118964224. Throughput: 0: 19095.6. Samples: 1194180. Policy #0 lag: (min: 0.0, avg: 4.5, max: 9.0) [2024-03-29 12:58:13,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 12:58:15,294][01503] Worker 18 awakens! [2024-03-29 12:58:17,973][00497] Updated weights for policy 0, policy_version 7267 (0.0011) [2024-03-29 12:58:18,839][00126] Fps is (10 sec: 26214.2, 60 sec: 16930.2, 300 sec: 16930.2). Total num frames: 119095296. Throughput: 0: 16682.2. Samples: 1273760. Policy #0 lag: (min: 1.0, avg: 7.3, max: 15.0) [2024-03-29 12:58:18,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 12:58:20,005][01785] Worker 20 awakens! [2024-03-29 12:58:20,042][01142] Worker 19 awakens! [2024-03-29 12:58:22,568][00497] Updated weights for policy 0, policy_version 7277 (0.0014) [2024-03-29 12:58:23,839][00126] Fps is (10 sec: 27852.6, 60 sec: 19387.7, 300 sec: 17896.4). Total num frames: 119242752. Throughput: 0: 17724.0. Samples: 1454760. Policy #0 lag: (min: 0.0, avg: 7.0, max: 15.0) [2024-03-29 12:58:23,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 12:58:24,577][01721] Worker 22 awakens! [2024-03-29 12:58:24,695][00949] Worker 21 awakens! [2024-03-29 12:58:27,023][00497] Updated weights for policy 0, policy_version 7287 (0.0013) [2024-03-29 12:58:27,046][01141] Worker 23 awakens! [2024-03-29 12:58:28,839][00126] Fps is (10 sec: 36044.7, 60 sec: 22937.6, 300 sec: 19660.8). Total num frames: 119455744. Throughput: 0: 22146.6. Samples: 1674840. Policy #0 lag: (min: 1.0, avg: 31.0, max: 81.0) [2024-03-29 12:58:28,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 12:58:29,267][01844] Worker 24 awakens! [2024-03-29 12:58:30,855][00497] Updated weights for policy 0, policy_version 7297 (0.0013) [2024-03-29 12:58:33,565][01076] Worker 25 awakens! [2024-03-29 12:58:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 26760.5, 300 sec: 21408.4). Total num frames: 119685120. Throughput: 0: 24564.4. Samples: 1803960. Policy #0 lag: (min: 1.0, avg: 31.0, max: 81.0) [2024-03-29 12:58:33,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 12:58:33,932][01786] Worker 26 awakens! [2024-03-29 12:58:35,040][00497] Updated weights for policy 0, policy_version 7307 (0.0015) [2024-03-29 12:58:36,493][01207] Worker 27 awakens! [2024-03-29 12:58:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 24849.0, 300 sec: 22118.4). Total num frames: 119848960. Throughput: 0: 29056.8. Samples: 2057560. Policy #0 lag: (min: 0.0, avg: 37.9, max: 98.0) [2024-03-29 12:58:38,840][00126] Avg episode reward: [(0, '0.266')] [2024-03-29 12:58:39,345][00497] Updated weights for policy 0, policy_version 7317 (0.0013) [2024-03-29 12:58:39,530][00476] Signal inference workers to stop experience collection... (50 times) [2024-03-29 12:58:39,544][00497] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-03-29 12:58:39,745][00476] Signal inference workers to resume experience collection... (50 times) [2024-03-29 12:58:39,745][00497] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-03-29 12:58:40,654][02105] Worker 28 awakens! [2024-03-29 12:58:41,105][01182] Worker 29 awakens! [2024-03-29 12:58:43,279][00497] Updated weights for policy 0, policy_version 7327 (0.0017) [2024-03-29 12:58:43,396][01915] Worker 30 awakens! [2024-03-29 12:58:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 27306.6, 300 sec: 23323.1). Total num frames: 120061952. Throughput: 0: 32639.4. Samples: 2292180. Policy #0 lag: (min: 2.0, avg: 11.2, max: 20.0) [2024-03-29 12:58:43,840][00126] Avg episode reward: [(0, '0.279')] [2024-03-29 12:58:45,654][01465] Worker 31 awakens! [2024-03-29 12:58:47,844][00497] Updated weights for policy 0, policy_version 7337 (0.0021) [2024-03-29 12:58:48,021][03197] Worker 32 awakens! [2024-03-29 12:58:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 29764.3, 300 sec: 24211.9). Total num frames: 120258560. Throughput: 0: 33926.1. Samples: 2411400. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-03-29 12:58:48,841][00126] Avg episode reward: [(0, '0.273')] [2024-03-29 12:58:50,491][01720] Worker 33 awakens! [2024-03-29 12:58:51,757][00497] Updated weights for policy 0, policy_version 7347 (0.0021) [2024-03-29 12:58:52,726][03065] Worker 34 awakens! [2024-03-29 12:58:53,839][00126] Fps is (10 sec: 39322.0, 60 sec: 32494.9, 300 sec: 25007.2). Total num frames: 120455168. Throughput: 0: 35969.8. Samples: 2642640. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-03-29 12:58:53,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 12:58:55,234][01902] Worker 35 awakens! [2024-03-29 12:58:56,278][00497] Updated weights for policy 0, policy_version 7357 (0.0021) [2024-03-29 12:58:57,433][02724] Worker 36 awakens! [2024-03-29 12:58:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 35225.5, 300 sec: 26050.6). Total num frames: 120684544. Throughput: 0: 37241.3. Samples: 2870040. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-03-29 12:58:58,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 12:58:59,741][02169] Worker 37 awakens! [2024-03-29 12:58:59,945][00497] Updated weights for policy 0, policy_version 7367 (0.0020) [2024-03-29 12:59:02,021][02681] Worker 38 awakens! [2024-03-29 12:59:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 35771.7, 300 sec: 26370.5). Total num frames: 120848384. Throughput: 0: 38338.3. Samples: 2998980. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-03-29 12:59:03,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 12:59:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007377_120864768.pth... [2024-03-29 12:59:03,919][00497] Updated weights for policy 0, policy_version 7377 (0.0023) [2024-03-29 12:59:04,228][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000006906_113147904.pth [2024-03-29 12:59:06,430][02288] Worker 39 awakens! [2024-03-29 12:59:06,957][02679] Worker 40 awakens! [2024-03-29 12:59:07,846][00497] Updated weights for policy 0, policy_version 7387 (0.0026) [2024-03-29 12:59:08,839][00126] Fps is (10 sec: 37683.1, 60 sec: 37137.0, 300 sec: 27108.1). Total num frames: 121061376. Throughput: 0: 39645.8. Samples: 3238820. Policy #0 lag: (min: 2.0, avg: 13.5, max: 26.0) [2024-03-29 12:59:08,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 12:59:09,207][02314] Worker 41 awakens! [2024-03-29 12:59:11,571][02682] Worker 42 awakens! [2024-03-29 12:59:12,448][00497] Updated weights for policy 0, policy_version 7397 (0.0021) [2024-03-29 12:59:13,808][02550] Worker 43 awakens! [2024-03-29 12:59:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 38229.2, 300 sec: 27639.1). Total num frames: 121257984. Throughput: 0: 39994.2. Samples: 3474580. Policy #0 lag: (min: 2.0, avg: 13.5, max: 26.0) [2024-03-29 12:59:13,840][00126] Avg episode reward: [(0, '0.254')] [2024-03-29 12:59:16,251][02680] Worker 44 awakens! [2024-03-29 12:59:17,161][00497] Updated weights for policy 0, policy_version 7407 (0.0019) [2024-03-29 12:59:17,899][00476] Signal inference workers to stop experience collection... (100 times) [2024-03-29 12:59:17,919][00497] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-03-29 12:59:18,109][00476] Signal inference workers to resume experience collection... (100 times) [2024-03-29 12:59:18,109][00497] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-03-29 12:59:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 39321.6, 300 sec: 28125.9). Total num frames: 121454592. Throughput: 0: 40171.6. Samples: 3611680. Policy #0 lag: (min: 1.0, avg: 14.3, max: 26.0) [2024-03-29 12:59:18,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 12:59:19,934][00497] Updated weights for policy 0, policy_version 7417 (0.0018) [2024-03-29 12:59:20,499][03067] Worker 45 awakens! [2024-03-29 12:59:20,830][03452] Worker 46 awakens! [2024-03-29 12:59:20,933][00476] self.policy_id=0 batch has 56.25% of invalid samples [2024-03-29 12:59:23,154][03066] Worker 47 awakens! [2024-03-29 12:59:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40140.7, 300 sec: 28573.7). Total num frames: 121651200. Throughput: 0: 39690.2. Samples: 3843620. Policy #0 lag: (min: 0.0, avg: 125.0, max: 211.0) [2024-03-29 12:59:23,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 12:59:24,259][00497] Updated weights for policy 0, policy_version 7427 (0.0020) [2024-03-29 12:59:25,615][03962] Worker 48 awakens! [2024-03-29 12:59:27,813][03063] Worker 49 awakens! [2024-03-29 12:59:28,839][00126] Fps is (10 sec: 37683.2, 60 sec: 39594.7, 300 sec: 28861.1). Total num frames: 121831424. Throughput: 0: 40418.8. Samples: 4111020. Policy #0 lag: (min: 1.0, avg: 52.8, max: 227.0) [2024-03-29 12:59:28,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 12:59:28,847][00497] Updated weights for policy 0, policy_version 7437 (0.0019) [2024-03-29 12:59:30,373][03324] Worker 50 awakens! [2024-03-29 12:59:31,643][00497] Updated weights for policy 0, policy_version 7447 (0.0018) [2024-03-29 12:59:32,741][03068] Worker 51 awakens! [2024-03-29 12:59:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 39867.8, 300 sec: 29612.6). Total num frames: 122077184. Throughput: 0: 39756.5. Samples: 4200440. Policy #0 lag: (min: 1.0, avg: 52.8, max: 227.0) [2024-03-29 12:59:33,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 12:59:34,898][03770] Worker 52 awakens! [2024-03-29 12:59:35,755][00497] Updated weights for policy 0, policy_version 7457 (0.0021) [2024-03-29 12:59:37,313][03133] Worker 53 awakens! [2024-03-29 12:59:38,839][00126] Fps is (10 sec: 47513.1, 60 sec: 40960.0, 300 sec: 30193.4). Total num frames: 122306560. Throughput: 0: 40563.9. Samples: 4468020. Policy #0 lag: (min: 0.0, avg: 17.3, max: 34.0) [2024-03-29 12:59:38,840][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 12:59:39,713][03897] Worker 54 awakens! [2024-03-29 12:59:40,631][00497] Updated weights for policy 0, policy_version 7467 (0.0020) [2024-03-29 12:59:41,928][03132] Worker 55 awakens! [2024-03-29 12:59:43,441][00497] Updated weights for policy 0, policy_version 7477 (0.0018) [2024-03-29 12:59:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 40960.1, 300 sec: 30621.2). Total num frames: 122519552. Throughput: 0: 41176.9. Samples: 4723000. Policy #0 lag: (min: 0.0, avg: 14.8, max: 35.0) [2024-03-29 12:59:43,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 12:59:44,249][03769] Worker 56 awakens! [2024-03-29 12:59:46,577][03388] Worker 57 awakens! [2024-03-29 12:59:47,417][00497] Updated weights for policy 0, policy_version 7487 (0.0022) [2024-03-29 12:59:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40960.0, 300 sec: 30911.2). Total num frames: 122716160. Throughput: 0: 41136.4. Samples: 4850120. Policy #0 lag: (min: 0.0, avg: 18.7, max: 36.0) [2024-03-29 12:59:48,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 12:59:48,949][03898] Worker 58 awakens! [2024-03-29 12:59:51,116][00497] Updated weights for policy 0, policy_version 7497 (0.0019) [2024-03-29 12:59:51,289][02614] Worker 59 awakens! [2024-03-29 12:59:51,736][00476] Signal inference workers to stop experience collection... (150 times) [2024-03-29 12:59:51,779][00497] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-03-29 12:59:51,816][00476] Signal inference workers to resume experience collection... (150 times) [2024-03-29 12:59:51,819][00497] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-03-29 12:59:53,683][03643] Worker 60 awakens! [2024-03-29 12:59:53,839][00126] Fps is (10 sec: 37683.0, 60 sec: 40686.9, 300 sec: 31076.8). Total num frames: 122896384. Throughput: 0: 41400.0. Samples: 5101820. Policy #0 lag: (min: 0.0, avg: 18.7, max: 36.0) [2024-03-29 12:59:53,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 12:59:55,868][00497] Updated weights for policy 0, policy_version 7507 (0.0020) [2024-03-29 12:59:57,902][02678] Worker 61 awakens! [2024-03-29 12:59:58,320][02747] Worker 62 awakens! [2024-03-29 12:59:58,508][00497] Updated weights for policy 0, policy_version 7517 (0.0019) [2024-03-29 12:59:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 31744.0). Total num frames: 123158528. Throughput: 0: 41614.8. Samples: 5347240. Policy #0 lag: (min: 0.0, avg: 15.8, max: 37.0) [2024-03-29 12:59:58,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:00:00,861][03064] Worker 63 awakens! [2024-03-29 13:00:02,786][00497] Updated weights for policy 0, policy_version 7527 (0.0020) [2024-03-29 13:00:03,839][00126] Fps is (10 sec: 45875.3, 60 sec: 41779.2, 300 sec: 31973.7). Total num frames: 123355136. Throughput: 0: 41873.8. Samples: 5496000. Policy #0 lag: (min: 2.0, avg: 21.1, max: 41.0) [2024-03-29 13:00:03,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 13:00:06,290][00497] Updated weights for policy 0, policy_version 7537 (0.0019) [2024-03-29 13:00:08,839][00126] Fps is (10 sec: 36044.4, 60 sec: 40959.9, 300 sec: 31997.0). Total num frames: 123518976. Throughput: 0: 42320.4. Samples: 5748040. Policy #0 lag: (min: 2.0, avg: 21.1, max: 41.0) [2024-03-29 13:00:08,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:00:11,348][00497] Updated weights for policy 0, policy_version 7547 (0.0025) [2024-03-29 13:00:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 32674.4). Total num frames: 123797504. Throughput: 0: 41841.8. Samples: 5993900. Policy #0 lag: (min: 0.0, avg: 19.0, max: 39.0) [2024-03-29 13:00:13,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:00:13,990][00497] Updated weights for policy 0, policy_version 7557 (0.0028) [2024-03-29 13:00:18,340][00497] Updated weights for policy 0, policy_version 7567 (0.0020) [2024-03-29 13:00:18,839][00126] Fps is (10 sec: 47513.8, 60 sec: 42325.3, 300 sec: 32859.0). Total num frames: 123994112. Throughput: 0: 42745.7. Samples: 6124000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 40.0) [2024-03-29 13:00:18,840][00126] Avg episode reward: [(0, '0.250')] [2024-03-29 13:00:22,084][00497] Updated weights for policy 0, policy_version 7577 (0.0027) [2024-03-29 13:00:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.3, 300 sec: 32945.1). Total num frames: 124174336. Throughput: 0: 42193.8. Samples: 6366740. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 13:00:23,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:00:26,039][00476] Signal inference workers to stop experience collection... (200 times) [2024-03-29 13:00:26,076][00497] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-03-29 13:00:26,253][00476] Signal inference workers to resume experience collection... (200 times) [2024-03-29 13:00:26,254][00497] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-03-29 13:00:27,140][00497] Updated weights for policy 0, policy_version 7587 (0.0035) [2024-03-29 13:00:28,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42598.5, 300 sec: 33199.2). Total num frames: 124387328. Throughput: 0: 42442.3. Samples: 6632900. Policy #0 lag: (min: 0.0, avg: 22.9, max: 42.0) [2024-03-29 13:00:28,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:00:29,912][00497] Updated weights for policy 0, policy_version 7597 (0.0023) [2024-03-29 13:00:33,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 33440.2). Total num frames: 124600320. Throughput: 0: 41895.2. Samples: 6735400. Policy #0 lag: (min: 2.0, avg: 24.2, max: 44.0) [2024-03-29 13:00:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:00:34,385][00497] Updated weights for policy 0, policy_version 7607 (0.0026) [2024-03-29 13:00:37,931][00497] Updated weights for policy 0, policy_version 7617 (0.0020) [2024-03-29 13:00:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.3, 300 sec: 33669.1). Total num frames: 124813312. Throughput: 0: 41911.1. Samples: 6987820. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 13:00:38,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:00:43,094][00497] Updated weights for policy 0, policy_version 7627 (0.0028) [2024-03-29 13:00:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.0, 300 sec: 33727.1). Total num frames: 124993536. Throughput: 0: 42470.2. Samples: 7258400. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 13:00:43,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:00:45,887][00497] Updated weights for policy 0, policy_version 7637 (0.0021) [2024-03-29 13:00:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 34016.3). Total num frames: 125222912. Throughput: 0: 41119.1. Samples: 7346360. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 13:00:48,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:00:50,144][00497] Updated weights for policy 0, policy_version 7647 (0.0022) [2024-03-29 13:00:53,500][00497] Updated weights for policy 0, policy_version 7657 (0.0022) [2024-03-29 13:00:53,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 34292.1). Total num frames: 125452288. Throughput: 0: 41503.2. Samples: 7615680. Policy #0 lag: (min: 1.0, avg: 22.3, max: 42.0) [2024-03-29 13:00:53,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:00:58,015][00476] Signal inference workers to stop experience collection... (250 times) [2024-03-29 13:00:58,016][00476] Signal inference workers to resume experience collection... (250 times) [2024-03-29 13:00:58,057][00497] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-03-29 13:00:58,057][00497] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-03-29 13:00:58,550][00497] Updated weights for policy 0, policy_version 7667 (0.0024) [2024-03-29 13:00:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 34332.0). Total num frames: 125632512. Throughput: 0: 42397.8. Samples: 7901800. Policy #0 lag: (min: 0.0, avg: 17.0, max: 41.0) [2024-03-29 13:00:58,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 13:01:01,195][00497] Updated weights for policy 0, policy_version 7677 (0.0027) [2024-03-29 13:01:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 34588.5). Total num frames: 125861888. Throughput: 0: 41239.6. Samples: 7979780. Policy #0 lag: (min: 0.0, avg: 17.0, max: 41.0) [2024-03-29 13:01:03,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:01:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007682_125861888.pth... [2024-03-29 13:01:04,169][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007207_118079488.pth [2024-03-29 13:01:05,638][00497] Updated weights for policy 0, policy_version 7687 (0.0031) [2024-03-29 13:01:08,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42598.4, 300 sec: 34762.6). Total num frames: 126074880. Throughput: 0: 41927.6. Samples: 8253480. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 13:01:08,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 13:01:09,247][00497] Updated weights for policy 0, policy_version 7697 (0.0022) [2024-03-29 13:01:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40686.9, 300 sec: 34720.2). Total num frames: 126238720. Throughput: 0: 42014.6. Samples: 8523560. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:01:13,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:01:14,327][00497] Updated weights for policy 0, policy_version 7707 (0.0018) [2024-03-29 13:01:16,925][00497] Updated weights for policy 0, policy_version 7717 (0.0024) [2024-03-29 13:01:18,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 35020.8). Total num frames: 126484480. Throughput: 0: 42009.7. Samples: 8625840. Policy #0 lag: (min: 1.0, avg: 22.8, max: 41.0) [2024-03-29 13:01:18,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:01:21,185][00497] Updated weights for policy 0, policy_version 7727 (0.0024) [2024-03-29 13:01:23,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42052.2, 300 sec: 35175.4). Total num frames: 126697472. Throughput: 0: 41876.3. Samples: 8872260. Policy #0 lag: (min: 1.0, avg: 22.8, max: 41.0) [2024-03-29 13:01:23,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:01:24,895][00497] Updated weights for policy 0, policy_version 7737 (0.0029) [2024-03-29 13:01:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.0, 300 sec: 35127.3). Total num frames: 126861312. Throughput: 0: 41963.1. Samples: 9146740. Policy #0 lag: (min: 0.0, avg: 22.8, max: 41.0) [2024-03-29 13:01:28,842][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:01:29,791][00476] Signal inference workers to stop experience collection... (300 times) [2024-03-29 13:01:29,872][00497] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-03-29 13:01:29,955][00476] Signal inference workers to resume experience collection... (300 times) [2024-03-29 13:01:29,955][00497] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-03-29 13:01:29,959][00497] Updated weights for policy 0, policy_version 7747 (0.0021) [2024-03-29 13:01:32,713][00497] Updated weights for policy 0, policy_version 7757 (0.0034) [2024-03-29 13:01:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 35530.8). Total num frames: 127139840. Throughput: 0: 42675.9. Samples: 9266780. Policy #0 lag: (min: 2.0, avg: 22.4, max: 44.0) [2024-03-29 13:01:33,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:01:36,667][00497] Updated weights for policy 0, policy_version 7767 (0.0024) [2024-03-29 13:01:38,839][00126] Fps is (10 sec: 45875.5, 60 sec: 41779.2, 300 sec: 35540.7). Total num frames: 127320064. Throughput: 0: 41796.5. Samples: 9496520. Policy #0 lag: (min: 2.0, avg: 22.4, max: 44.0) [2024-03-29 13:01:38,840][00126] Avg episode reward: [(0, '0.280')] [2024-03-29 13:01:40,377][00497] Updated weights for policy 0, policy_version 7777 (0.0020) [2024-03-29 13:01:43,839][00126] Fps is (10 sec: 34406.7, 60 sec: 41506.1, 300 sec: 35488.4). Total num frames: 127483904. Throughput: 0: 41545.8. Samples: 9771360. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:01:43,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:01:45,700][00497] Updated weights for policy 0, policy_version 7787 (0.0023) [2024-03-29 13:01:48,420][00497] Updated weights for policy 0, policy_version 7797 (0.0024) [2024-03-29 13:01:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 35862.8). Total num frames: 127762432. Throughput: 0: 42744.4. Samples: 9903280. Policy #0 lag: (min: 2.0, avg: 18.7, max: 42.0) [2024-03-29 13:01:48,841][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:01:52,230][00497] Updated weights for policy 0, policy_version 7807 (0.0020) [2024-03-29 13:01:53,839][00126] Fps is (10 sec: 47513.6, 60 sec: 41779.2, 300 sec: 35925.7). Total num frames: 127959040. Throughput: 0: 41632.5. Samples: 10126940. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:01:53,840][00126] Avg episode reward: [(0, '0.281')] [2024-03-29 13:01:55,905][00497] Updated weights for policy 0, policy_version 7817 (0.0023) [2024-03-29 13:01:58,839][00126] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 35869.3). Total num frames: 128122880. Throughput: 0: 41820.0. Samples: 10405460. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:01:58,841][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:02:01,278][00497] Updated weights for policy 0, policy_version 7827 (0.0023) [2024-03-29 13:02:02,082][00476] Signal inference workers to stop experience collection... (350 times) [2024-03-29 13:02:02,122][00497] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-03-29 13:02:02,307][00476] Signal inference workers to resume experience collection... (350 times) [2024-03-29 13:02:02,307][00497] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-03-29 13:02:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 36159.8). Total num frames: 128385024. Throughput: 0: 42613.9. Samples: 10543460. Policy #0 lag: (min: 0.0, avg: 17.8, max: 42.0) [2024-03-29 13:02:03,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:02:04,004][00497] Updated weights for policy 0, policy_version 7837 (0.0034) [2024-03-29 13:02:07,808][00497] Updated weights for policy 0, policy_version 7847 (0.0029) [2024-03-29 13:02:08,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42052.2, 300 sec: 36270.8). Total num frames: 128598016. Throughput: 0: 42214.3. Samples: 10771900. Policy #0 lag: (min: 3.0, avg: 23.0, max: 43.0) [2024-03-29 13:02:08,840][00126] Avg episode reward: [(0, '0.268')] [2024-03-29 13:02:11,731][00497] Updated weights for policy 0, policy_version 7857 (0.0020) [2024-03-29 13:02:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 42052.3, 300 sec: 36211.4). Total num frames: 128761856. Throughput: 0: 41463.2. Samples: 11012580. Policy #0 lag: (min: 3.0, avg: 23.0, max: 43.0) [2024-03-29 13:02:13,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:02:17,056][00497] Updated weights for policy 0, policy_version 7867 (0.0026) [2024-03-29 13:02:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 36988.9). Total num frames: 128991232. Throughput: 0: 42090.2. Samples: 11160840. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 13:02:18,840][00126] Avg episode reward: [(0, '0.247')] [2024-03-29 13:02:19,821][00497] Updated weights for policy 0, policy_version 7877 (0.0019) [2024-03-29 13:02:23,621][00497] Updated weights for policy 0, policy_version 7887 (0.0016) [2024-03-29 13:02:23,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.4, 300 sec: 37766.5). Total num frames: 129220608. Throughput: 0: 42057.3. Samples: 11389100. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:02:23,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:02:27,660][00497] Updated weights for policy 0, policy_version 7897 (0.0018) [2024-03-29 13:02:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 38377.4). Total num frames: 129400832. Throughput: 0: 41166.1. Samples: 11623840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:02:28,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:02:32,871][00497] Updated weights for policy 0, policy_version 7907 (0.0031) [2024-03-29 13:02:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40960.1, 300 sec: 38099.7). Total num frames: 129597440. Throughput: 0: 41730.3. Samples: 11781140. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:02:33,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:02:34,455][00476] Signal inference workers to stop experience collection... (400 times) [2024-03-29 13:02:34,492][00497] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-03-29 13:02:34,669][00476] Signal inference workers to resume experience collection... (400 times) [2024-03-29 13:02:34,669][00497] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-03-29 13:02:35,733][00497] Updated weights for policy 0, policy_version 7917 (0.0019) [2024-03-29 13:02:38,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.2, 300 sec: 38710.7). Total num frames: 129843200. Throughput: 0: 41751.9. Samples: 12005780. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:02:38,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:02:39,274][00497] Updated weights for policy 0, policy_version 7927 (0.0033) [2024-03-29 13:02:43,091][00497] Updated weights for policy 0, policy_version 7937 (0.0024) [2024-03-29 13:02:43,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42871.4, 300 sec: 39266.1). Total num frames: 130056192. Throughput: 0: 41156.9. Samples: 12257520. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:02:43,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:02:48,447][00497] Updated weights for policy 0, policy_version 7947 (0.0019) [2024-03-29 13:02:48,839][00126] Fps is (10 sec: 37683.9, 60 sec: 40960.1, 300 sec: 39710.4). Total num frames: 130220032. Throughput: 0: 41565.9. Samples: 12413920. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 13:02:48,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:02:51,268][00497] Updated weights for policy 0, policy_version 7957 (0.0021) [2024-03-29 13:02:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 40321.3). Total num frames: 130465792. Throughput: 0: 41487.7. Samples: 12638840. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 13:02:53,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:02:54,707][00497] Updated weights for policy 0, policy_version 7967 (0.0026) [2024-03-29 13:02:58,586][00497] Updated weights for policy 0, policy_version 7977 (0.0030) [2024-03-29 13:02:58,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42871.5, 300 sec: 40654.5). Total num frames: 130695168. Throughput: 0: 41953.8. Samples: 12900500. Policy #0 lag: (min: 1.0, avg: 22.6, max: 43.0) [2024-03-29 13:02:58,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:03:03,609][00497] Updated weights for policy 0, policy_version 7987 (0.0025) [2024-03-29 13:03:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.1, 300 sec: 40765.6). Total num frames: 130859008. Throughput: 0: 41968.1. Samples: 13049400. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 13:03:03,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 13:03:04,128][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007989_130891776.pth... [2024-03-29 13:03:04,440][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007377_120864768.pth [2024-03-29 13:03:06,035][00476] Signal inference workers to stop experience collection... (450 times) [2024-03-29 13:03:06,035][00476] Signal inference workers to resume experience collection... (450 times) [2024-03-29 13:03:06,057][00497] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-03-29 13:03:06,058][00497] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-03-29 13:03:06,588][00497] Updated weights for policy 0, policy_version 7997 (0.0019) [2024-03-29 13:03:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41209.9). Total num frames: 131121152. Throughput: 0: 41961.7. Samples: 13277380. Policy #0 lag: (min: 1.0, avg: 20.8, max: 43.0) [2024-03-29 13:03:08,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:03:10,120][00497] Updated weights for policy 0, policy_version 8007 (0.0026) [2024-03-29 13:03:13,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42871.5, 300 sec: 41487.6). Total num frames: 131334144. Throughput: 0: 42676.6. Samples: 13544280. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:03:13,840][00126] Avg episode reward: [(0, '0.277')] [2024-03-29 13:03:13,953][00497] Updated weights for policy 0, policy_version 8017 (0.0026) [2024-03-29 13:03:18,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41779.3, 300 sec: 41543.2). Total num frames: 131497984. Throughput: 0: 42215.2. Samples: 13680820. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:03:18,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:03:19,158][00497] Updated weights for policy 0, policy_version 8027 (0.0025) [2024-03-29 13:03:21,892][00497] Updated weights for policy 0, policy_version 8037 (0.0025) [2024-03-29 13:03:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 131743744. Throughput: 0: 42412.5. Samples: 13914340. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 13:03:23,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:03:25,540][00497] Updated weights for policy 0, policy_version 8047 (0.0031) [2024-03-29 13:03:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 41598.7). Total num frames: 131956736. Throughput: 0: 42648.1. Samples: 14176680. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:03:28,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 13:03:29,536][00497] Updated weights for policy 0, policy_version 8057 (0.0034) [2024-03-29 13:03:33,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 132136960. Throughput: 0: 42304.8. Samples: 14317640. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:03:33,841][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:03:34,651][00497] Updated weights for policy 0, policy_version 8067 (0.0019) [2024-03-29 13:03:37,484][00497] Updated weights for policy 0, policy_version 8077 (0.0028) [2024-03-29 13:03:37,648][00476] Signal inference workers to stop experience collection... (500 times) [2024-03-29 13:03:37,683][00497] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-03-29 13:03:37,828][00476] Signal inference workers to resume experience collection... (500 times) [2024-03-29 13:03:37,829][00497] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-03-29 13:03:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 132382720. Throughput: 0: 42498.6. Samples: 14551280. Policy #0 lag: (min: 0.0, avg: 17.0, max: 41.0) [2024-03-29 13:03:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:03:41,223][00497] Updated weights for policy 0, policy_version 8087 (0.0020) [2024-03-29 13:03:43,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 132595712. Throughput: 0: 42286.1. Samples: 14803380. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 13:03:43,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:03:45,044][00497] Updated weights for policy 0, policy_version 8097 (0.0017) [2024-03-29 13:03:48,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42598.2, 300 sec: 41765.3). Total num frames: 132775936. Throughput: 0: 42030.6. Samples: 14940780. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 13:03:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:03:50,145][00497] Updated weights for policy 0, policy_version 8107 (0.0026) [2024-03-29 13:03:52,980][00497] Updated weights for policy 0, policy_version 8117 (0.0029) [2024-03-29 13:03:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 133021696. Throughput: 0: 42661.3. Samples: 15197140. Policy #0 lag: (min: 1.0, avg: 18.8, max: 42.0) [2024-03-29 13:03:53,841][00126] Avg episode reward: [(0, '0.273')] [2024-03-29 13:03:56,476][00497] Updated weights for policy 0, policy_version 8127 (0.0018) [2024-03-29 13:03:58,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 133234688. Throughput: 0: 42264.4. Samples: 15446180. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 13:03:58,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:04:00,330][00497] Updated weights for policy 0, policy_version 8137 (0.0024) [2024-03-29 13:04:03,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 133414912. Throughput: 0: 42105.3. Samples: 15575560. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:04:03,841][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:04:05,414][00497] Updated weights for policy 0, policy_version 8147 (0.0022) [2024-03-29 13:04:08,203][00497] Updated weights for policy 0, policy_version 8157 (0.0019) [2024-03-29 13:04:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 133660672. Throughput: 0: 42858.2. Samples: 15842960. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:04:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:04:11,992][00497] Updated weights for policy 0, policy_version 8167 (0.0020) [2024-03-29 13:04:13,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 133873664. Throughput: 0: 42369.3. Samples: 16083300. Policy #0 lag: (min: 1.0, avg: 21.8, max: 41.0) [2024-03-29 13:04:13,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:04:15,848][00497] Updated weights for policy 0, policy_version 8177 (0.0029) [2024-03-29 13:04:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 134053888. Throughput: 0: 42138.3. Samples: 16213860. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:04:18,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:04:19,177][00476] Signal inference workers to stop experience collection... (550 times) [2024-03-29 13:04:19,253][00497] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-03-29 13:04:19,263][00476] Signal inference workers to resume experience collection... (550 times) [2024-03-29 13:04:19,285][00497] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-03-29 13:04:20,984][00497] Updated weights for policy 0, policy_version 8187 (0.0023) [2024-03-29 13:04:23,754][00497] Updated weights for policy 0, policy_version 8197 (0.0021) [2024-03-29 13:04:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42265.1). Total num frames: 134299648. Throughput: 0: 42874.1. Samples: 16480620. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:04:23,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:04:27,561][00497] Updated weights for policy 0, policy_version 8207 (0.0025) [2024-03-29 13:04:28,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 134512640. Throughput: 0: 42443.2. Samples: 16713320. Policy #0 lag: (min: 0.0, avg: 22.5, max: 41.0) [2024-03-29 13:04:28,840][00126] Avg episode reward: [(0, '0.277')] [2024-03-29 13:04:31,448][00497] Updated weights for policy 0, policy_version 8217 (0.0023) [2024-03-29 13:04:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 134692864. Throughput: 0: 42362.7. Samples: 16847100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:04:33,842][00126] Avg episode reward: [(0, '0.230')] [2024-03-29 13:04:36,513][00497] Updated weights for policy 0, policy_version 8227 (0.0017) [2024-03-29 13:04:38,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 134922240. Throughput: 0: 42703.1. Samples: 17118780. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:04:38,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 13:04:39,030][00476] Saving new best policy, reward=0.455! [2024-03-29 13:04:39,604][00497] Updated weights for policy 0, policy_version 8237 (0.0020) [2024-03-29 13:04:43,374][00497] Updated weights for policy 0, policy_version 8247 (0.0029) [2024-03-29 13:04:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 135135232. Throughput: 0: 42155.4. Samples: 17343180. Policy #0 lag: (min: 1.0, avg: 22.5, max: 44.0) [2024-03-29 13:04:43,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:04:47,049][00497] Updated weights for policy 0, policy_version 8257 (0.0028) [2024-03-29 13:04:48,841][00126] Fps is (10 sec: 39314.0, 60 sec: 42324.0, 300 sec: 42098.3). Total num frames: 135315456. Throughput: 0: 42242.1. Samples: 17476540. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 13:04:48,842][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:04:51,638][00476] Signal inference workers to stop experience collection... (600 times) [2024-03-29 13:04:51,751][00497] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-03-29 13:04:51,837][00476] Signal inference workers to resume experience collection... (600 times) [2024-03-29 13:04:51,837][00497] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-03-29 13:04:52,141][00497] Updated weights for policy 0, policy_version 8267 (0.0022) [2024-03-29 13:04:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 135544832. Throughput: 0: 42478.6. Samples: 17754500. Policy #0 lag: (min: 0.0, avg: 22.1, max: 43.0) [2024-03-29 13:04:53,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:04:54,884][00497] Updated weights for policy 0, policy_version 8277 (0.0019) [2024-03-29 13:04:58,550][00497] Updated weights for policy 0, policy_version 8287 (0.0022) [2024-03-29 13:04:58,839][00126] Fps is (10 sec: 45884.5, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 135774208. Throughput: 0: 42228.4. Samples: 17983580. Policy #0 lag: (min: 2.0, avg: 21.2, max: 43.0) [2024-03-29 13:04:58,840][00126] Avg episode reward: [(0, '0.260')] [2024-03-29 13:05:02,360][00497] Updated weights for policy 0, policy_version 8297 (0.0018) [2024-03-29 13:05:03,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 135970816. Throughput: 0: 42303.8. Samples: 18117540. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:05:03,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:05:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008299_135970816.pth... [2024-03-29 13:05:04,192][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007682_125861888.pth [2024-03-29 13:05:07,383][00497] Updated weights for policy 0, policy_version 8307 (0.0021) [2024-03-29 13:05:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 136183808. Throughput: 0: 42717.9. Samples: 18402920. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:05:08,840][00126] Avg episode reward: [(0, '0.265')] [2024-03-29 13:05:10,101][00497] Updated weights for policy 0, policy_version 8317 (0.0023) [2024-03-29 13:05:13,832][00497] Updated weights for policy 0, policy_version 8327 (0.0019) [2024-03-29 13:05:13,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 136429568. Throughput: 0: 42502.6. Samples: 18625940. Policy #0 lag: (min: 2.0, avg: 19.8, max: 41.0) [2024-03-29 13:05:13,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:05:17,710][00497] Updated weights for policy 0, policy_version 8337 (0.0020) [2024-03-29 13:05:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.5, 300 sec: 42154.1). Total num frames: 136609792. Throughput: 0: 42370.0. Samples: 18753740. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:05:18,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:05:22,907][00497] Updated weights for policy 0, policy_version 8347 (0.0022) [2024-03-29 13:05:23,179][00476] Signal inference workers to stop experience collection... (650 times) [2024-03-29 13:05:23,218][00497] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-03-29 13:05:23,409][00476] Signal inference workers to resume experience collection... (650 times) [2024-03-29 13:05:23,409][00497] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-03-29 13:05:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 136806400. Throughput: 0: 42568.0. Samples: 19034340. Policy #0 lag: (min: 2.0, avg: 18.1, max: 42.0) [2024-03-29 13:05:23,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 13:05:25,686][00497] Updated weights for policy 0, policy_version 8357 (0.0027) [2024-03-29 13:05:28,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 137035776. Throughput: 0: 42390.3. Samples: 19250740. Policy #0 lag: (min: 2.0, avg: 18.1, max: 42.0) [2024-03-29 13:05:28,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:05:29,847][00497] Updated weights for policy 0, policy_version 8368 (0.0027) [2024-03-29 13:05:33,690][00497] Updated weights for policy 0, policy_version 8378 (0.0023) [2024-03-29 13:05:33,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 137265152. Throughput: 0: 42376.1. Samples: 19383380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 13:05:33,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:05:38,792][00497] Updated weights for policy 0, policy_version 8388 (0.0026) [2024-03-29 13:05:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 137428992. Throughput: 0: 42312.8. Samples: 19658580. Policy #0 lag: (min: 0.0, avg: 16.7, max: 43.0) [2024-03-29 13:05:38,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 13:05:41,709][00497] Updated weights for policy 0, policy_version 8398 (0.0029) [2024-03-29 13:05:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 137674752. Throughput: 0: 42190.3. Samples: 19882140. Policy #0 lag: (min: 0.0, avg: 16.7, max: 43.0) [2024-03-29 13:05:43,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 13:05:45,643][00497] Updated weights for policy 0, policy_version 8408 (0.0019) [2024-03-29 13:05:48,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42872.9, 300 sec: 42154.1). Total num frames: 137887744. Throughput: 0: 42037.4. Samples: 20009220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 13:05:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:05:49,349][00497] Updated weights for policy 0, policy_version 8418 (0.0026) [2024-03-29 13:05:53,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 138035200. Throughput: 0: 41754.6. Samples: 20281880. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 13:05:53,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:05:54,503][00497] Updated weights for policy 0, policy_version 8428 (0.0031) [2024-03-29 13:05:55,034][00476] Signal inference workers to stop experience collection... (700 times) [2024-03-29 13:05:55,054][00497] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-03-29 13:05:55,244][00476] Signal inference workers to resume experience collection... (700 times) [2024-03-29 13:05:55,244][00497] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-03-29 13:05:57,337][00497] Updated weights for policy 0, policy_version 8438 (0.0029) [2024-03-29 13:05:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 138313728. Throughput: 0: 41983.6. Samples: 20515200. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 13:05:58,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:06:00,958][00497] Updated weights for policy 0, policy_version 8448 (0.0020) [2024-03-29 13:06:03,839][00126] Fps is (10 sec: 49151.4, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 138526720. Throughput: 0: 42011.4. Samples: 20644260. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 13:06:03,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:06:04,764][00497] Updated weights for policy 0, policy_version 8458 (0.0020) [2024-03-29 13:06:08,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 138690560. Throughput: 0: 42006.8. Samples: 20924640. Policy #0 lag: (min: 0.0, avg: 18.2, max: 40.0) [2024-03-29 13:06:08,841][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:06:09,654][00497] Updated weights for policy 0, policy_version 8468 (0.0031) [2024-03-29 13:06:12,581][00497] Updated weights for policy 0, policy_version 8478 (0.0029) [2024-03-29 13:06:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 138952704. Throughput: 0: 42353.8. Samples: 21156660. Policy #0 lag: (min: 0.0, avg: 18.2, max: 40.0) [2024-03-29 13:06:13,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:06:16,126][00497] Updated weights for policy 0, policy_version 8488 (0.0019) [2024-03-29 13:06:18,839][00126] Fps is (10 sec: 47513.0, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 139165696. Throughput: 0: 42232.8. Samples: 21283860. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 13:06:18,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:06:20,230][00497] Updated weights for policy 0, policy_version 8498 (0.0029) [2024-03-29 13:06:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 139345920. Throughput: 0: 42227.2. Samples: 21558800. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:06:23,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:06:25,190][00497] Updated weights for policy 0, policy_version 8508 (0.0024) [2024-03-29 13:06:27,113][00476] Signal inference workers to stop experience collection... (750 times) [2024-03-29 13:06:27,113][00476] Signal inference workers to resume experience collection... (750 times) [2024-03-29 13:06:27,155][00497] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-03-29 13:06:27,155][00497] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-03-29 13:06:28,082][00497] Updated weights for policy 0, policy_version 8518 (0.0025) [2024-03-29 13:06:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 139591680. Throughput: 0: 42586.6. Samples: 21798540. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:06:28,841][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:06:31,897][00497] Updated weights for policy 0, policy_version 8528 (0.0024) [2024-03-29 13:06:33,839][00126] Fps is (10 sec: 45874.4, 60 sec: 42325.2, 300 sec: 42320.7). Total num frames: 139804672. Throughput: 0: 42564.4. Samples: 21924620. Policy #0 lag: (min: 1.0, avg: 23.1, max: 42.0) [2024-03-29 13:06:33,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:06:35,633][00497] Updated weights for policy 0, policy_version 8538 (0.0023) [2024-03-29 13:06:38,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42598.5, 300 sec: 42376.2). Total num frames: 139984896. Throughput: 0: 42530.2. Samples: 22195740. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 13:06:38,841][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:06:40,671][00497] Updated weights for policy 0, policy_version 8548 (0.0027) [2024-03-29 13:06:43,514][00497] Updated weights for policy 0, policy_version 8558 (0.0027) [2024-03-29 13:06:43,839][00126] Fps is (10 sec: 40960.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 140214272. Throughput: 0: 42685.3. Samples: 22436040. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 13:06:43,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:06:47,848][00497] Updated weights for policy 0, policy_version 8568 (0.0021) [2024-03-29 13:06:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.5, 300 sec: 42265.2). Total num frames: 140427264. Throughput: 0: 42506.0. Samples: 22557020. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:06:48,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 13:06:51,370][00497] Updated weights for policy 0, policy_version 8578 (0.0023) [2024-03-29 13:06:53,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42871.4, 300 sec: 42320.7). Total num frames: 140607488. Throughput: 0: 41876.7. Samples: 22809100. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 13:06:53,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:06:56,318][00497] Updated weights for policy 0, policy_version 8588 (0.0022) [2024-03-29 13:06:58,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 140820480. Throughput: 0: 42350.6. Samples: 23062440. Policy #0 lag: (min: 0.0, avg: 19.6, max: 42.0) [2024-03-29 13:06:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:06:58,851][00476] Signal inference workers to stop experience collection... (800 times) [2024-03-29 13:06:58,906][00497] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-03-29 13:06:58,941][00476] Signal inference workers to resume experience collection... (800 times) [2024-03-29 13:06:58,944][00497] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-03-29 13:06:59,484][00497] Updated weights for policy 0, policy_version 8598 (0.0035) [2024-03-29 13:07:03,705][00497] Updated weights for policy 0, policy_version 8608 (0.0027) [2024-03-29 13:07:03,840][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 141033472. Throughput: 0: 42006.6. Samples: 23174160. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 13:07:03,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:07:04,214][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008610_141066240.pth... [2024-03-29 13:07:04,537][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000007989_130891776.pth [2024-03-29 13:07:07,529][00497] Updated weights for policy 0, policy_version 8618 (0.0024) [2024-03-29 13:07:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 141230080. Throughput: 0: 41305.4. Samples: 23417540. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:07:08,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:07:12,268][00497] Updated weights for policy 0, policy_version 8628 (0.0025) [2024-03-29 13:07:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 141443072. Throughput: 0: 41765.3. Samples: 23677980. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:07:13,842][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:07:15,422][00497] Updated weights for policy 0, policy_version 8638 (0.0028) [2024-03-29 13:07:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 141672448. Throughput: 0: 41623.7. Samples: 23797680. Policy #0 lag: (min: 1.0, avg: 20.4, max: 42.0) [2024-03-29 13:07:18,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:07:19,209][00497] Updated weights for policy 0, policy_version 8648 (0.0026) [2024-03-29 13:07:23,036][00497] Updated weights for policy 0, policy_version 8658 (0.0024) [2024-03-29 13:07:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 141852672. Throughput: 0: 41345.8. Samples: 24056300. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 13:07:23,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:07:27,630][00497] Updated weights for policy 0, policy_version 8668 (0.0018) [2024-03-29 13:07:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 142082048. Throughput: 0: 41903.5. Samples: 24321700. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 13:07:28,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:07:29,994][00476] Signal inference workers to stop experience collection... (850 times) [2024-03-29 13:07:30,014][00497] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-03-29 13:07:30,207][00476] Signal inference workers to resume experience collection... (850 times) [2024-03-29 13:07:30,208][00497] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-03-29 13:07:30,811][00497] Updated weights for policy 0, policy_version 8678 (0.0029) [2024-03-29 13:07:33,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 142311424. Throughput: 0: 41688.8. Samples: 24433020. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:07:33,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:07:34,894][00497] Updated weights for policy 0, policy_version 8688 (0.0022) [2024-03-29 13:07:38,582][00497] Updated weights for policy 0, policy_version 8698 (0.0022) [2024-03-29 13:07:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 142508032. Throughput: 0: 41617.8. Samples: 24681900. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:07:38,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:07:43,219][00497] Updated weights for policy 0, policy_version 8708 (0.0026) [2024-03-29 13:07:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 142704640. Throughput: 0: 42261.9. Samples: 24964220. Policy #0 lag: (min: 0.0, avg: 22.0, max: 42.0) [2024-03-29 13:07:43,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:07:46,435][00497] Updated weights for policy 0, policy_version 8718 (0.0021) [2024-03-29 13:07:48,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 142950400. Throughput: 0: 42008.6. Samples: 25064540. Policy #0 lag: (min: 0.0, avg: 20.6, max: 45.0) [2024-03-29 13:07:48,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:07:50,594][00497] Updated weights for policy 0, policy_version 8728 (0.0023) [2024-03-29 13:07:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 143147008. Throughput: 0: 42073.8. Samples: 25310860. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 13:07:53,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:07:54,118][00497] Updated weights for policy 0, policy_version 8738 (0.0025) [2024-03-29 13:07:58,722][00497] Updated weights for policy 0, policy_version 8748 (0.0028) [2024-03-29 13:07:58,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 143327232. Throughput: 0: 42600.6. Samples: 25595000. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 13:07:58,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:08:01,968][00497] Updated weights for policy 0, policy_version 8758 (0.0021) [2024-03-29 13:08:02,593][00476] Signal inference workers to stop experience collection... (900 times) [2024-03-29 13:08:02,671][00476] Signal inference workers to resume experience collection... (900 times) [2024-03-29 13:08:02,668][00497] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-03-29 13:08:02,705][00497] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-03-29 13:08:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.5, 300 sec: 42209.6). Total num frames: 143572992. Throughput: 0: 42335.5. Samples: 25702780. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 13:08:03,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 13:08:06,541][00497] Updated weights for policy 0, policy_version 8769 (0.0022) [2024-03-29 13:08:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 143769600. Throughput: 0: 42060.5. Samples: 25949020. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 13:08:08,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 13:08:10,263][00497] Updated weights for policy 0, policy_version 8779 (0.0028) [2024-03-29 13:08:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 143949824. Throughput: 0: 42362.8. Samples: 26228020. Policy #0 lag: (min: 1.0, avg: 22.2, max: 41.0) [2024-03-29 13:08:13,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:08:14,488][00497] Updated weights for policy 0, policy_version 8789 (0.0028) [2024-03-29 13:08:17,979][00497] Updated weights for policy 0, policy_version 8799 (0.0021) [2024-03-29 13:08:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 144195584. Throughput: 0: 42292.5. Samples: 26336180. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:08:18,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 13:08:22,035][00497] Updated weights for policy 0, policy_version 8809 (0.0022) [2024-03-29 13:08:23,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 144408576. Throughput: 0: 42345.5. Samples: 26587440. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 13:08:23,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:08:25,729][00497] Updated weights for policy 0, policy_version 8819 (0.0025) [2024-03-29 13:08:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 144588800. Throughput: 0: 42217.0. Samples: 26863980. Policy #0 lag: (min: 1.0, avg: 21.3, max: 42.0) [2024-03-29 13:08:28,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 13:08:29,948][00497] Updated weights for policy 0, policy_version 8829 (0.0031) [2024-03-29 13:08:33,533][00497] Updated weights for policy 0, policy_version 8839 (0.0020) [2024-03-29 13:08:33,544][00476] Signal inference workers to stop experience collection... (950 times) [2024-03-29 13:08:33,545][00476] Signal inference workers to resume experience collection... (950 times) [2024-03-29 13:08:33,583][00497] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-03-29 13:08:33,583][00497] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-03-29 13:08:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 144834560. Throughput: 0: 42348.9. Samples: 26970240. Policy #0 lag: (min: 2.0, avg: 20.9, max: 44.0) [2024-03-29 13:08:33,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:08:37,307][00497] Updated weights for policy 0, policy_version 8849 (0.0033) [2024-03-29 13:08:38,839][00126] Fps is (10 sec: 45874.3, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 145047552. Throughput: 0: 42477.7. Samples: 27222360. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:08:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:08:41,473][00497] Updated weights for policy 0, policy_version 8859 (0.0020) [2024-03-29 13:08:43,839][00126] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 145195008. Throughput: 0: 41838.1. Samples: 27477720. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:08:43,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:08:45,617][00497] Updated weights for policy 0, policy_version 8869 (0.0032) [2024-03-29 13:08:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 145457152. Throughput: 0: 42049.7. Samples: 27595020. Policy #0 lag: (min: 2.0, avg: 20.0, max: 43.0) [2024-03-29 13:08:48,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:08:49,445][00497] Updated weights for policy 0, policy_version 8879 (0.0022) [2024-03-29 13:08:53,131][00497] Updated weights for policy 0, policy_version 8889 (0.0026) [2024-03-29 13:08:53,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 145653760. Throughput: 0: 42024.8. Samples: 27840140. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:08:53,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:08:56,982][00497] Updated weights for policy 0, policy_version 8899 (0.0018) [2024-03-29 13:08:58,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 145850368. Throughput: 0: 41672.4. Samples: 28103280. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:08:58,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:09:01,382][00497] Updated weights for policy 0, policy_version 8909 (0.0028) [2024-03-29 13:09:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 146079744. Throughput: 0: 42047.0. Samples: 28228300. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:09:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:09:03,987][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008917_146096128.pth... [2024-03-29 13:09:04,305][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008299_135970816.pth [2024-03-29 13:09:04,982][00497] Updated weights for policy 0, policy_version 8919 (0.0019) [2024-03-29 13:09:08,456][00476] Signal inference workers to stop experience collection... (1000 times) [2024-03-29 13:09:08,532][00497] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-03-29 13:09:08,543][00476] Signal inference workers to resume experience collection... (1000 times) [2024-03-29 13:09:08,560][00497] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-03-29 13:09:08,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 146276352. Throughput: 0: 41658.1. Samples: 28462060. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:09:08,840][00126] Avg episode reward: [(0, '0.298')] [2024-03-29 13:09:08,852][00497] Updated weights for policy 0, policy_version 8929 (0.0027) [2024-03-29 13:09:12,723][00497] Updated weights for policy 0, policy_version 8939 (0.0022) [2024-03-29 13:09:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 146489344. Throughput: 0: 41542.5. Samples: 28733400. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:09:13,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:09:16,824][00497] Updated weights for policy 0, policy_version 8949 (0.0019) [2024-03-29 13:09:18,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 146702336. Throughput: 0: 42164.9. Samples: 28867660. Policy #0 lag: (min: 2.0, avg: 18.9, max: 42.0) [2024-03-29 13:09:18,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:09:20,274][00497] Updated weights for policy 0, policy_version 8959 (0.0018) [2024-03-29 13:09:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 146931712. Throughput: 0: 41804.0. Samples: 29103540. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:09:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:09:24,103][00497] Updated weights for policy 0, policy_version 8969 (0.0029) [2024-03-29 13:09:28,227][00497] Updated weights for policy 0, policy_version 8979 (0.0018) [2024-03-29 13:09:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 147128320. Throughput: 0: 42104.5. Samples: 29372420. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:09:28,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 13:09:32,133][00497] Updated weights for policy 0, policy_version 8989 (0.0025) [2024-03-29 13:09:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 42098.6). Total num frames: 147341312. Throughput: 0: 42701.8. Samples: 29516600. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 13:09:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:09:35,614][00497] Updated weights for policy 0, policy_version 8999 (0.0021) [2024-03-29 13:09:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 147570688. Throughput: 0: 42322.7. Samples: 29744660. Policy #0 lag: (min: 1.0, avg: 18.5, max: 42.0) [2024-03-29 13:09:38,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:09:39,273][00497] Updated weights for policy 0, policy_version 9009 (0.0019) [2024-03-29 13:09:40,344][00476] Signal inference workers to stop experience collection... (1050 times) [2024-03-29 13:09:40,351][00476] Signal inference workers to resume experience collection... (1050 times) [2024-03-29 13:09:40,369][00497] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-03-29 13:09:40,391][00497] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-03-29 13:09:43,584][00497] Updated weights for policy 0, policy_version 9019 (0.0023) [2024-03-29 13:09:43,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42871.5, 300 sec: 42209.9). Total num frames: 147767296. Throughput: 0: 42308.0. Samples: 30007140. Policy #0 lag: (min: 1.0, avg: 21.6, max: 40.0) [2024-03-29 13:09:43,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:09:47,579][00497] Updated weights for policy 0, policy_version 9029 (0.0029) [2024-03-29 13:09:48,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 147996672. Throughput: 0: 42825.7. Samples: 30155460. Policy #0 lag: (min: 1.0, avg: 17.8, max: 42.0) [2024-03-29 13:09:48,841][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:09:51,208][00497] Updated weights for policy 0, policy_version 9039 (0.0025) [2024-03-29 13:09:53,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 148193280. Throughput: 0: 42528.5. Samples: 30375840. Policy #0 lag: (min: 1.0, avg: 17.8, max: 42.0) [2024-03-29 13:09:53,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:09:54,858][00497] Updated weights for policy 0, policy_version 9049 (0.0029) [2024-03-29 13:09:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 148406272. Throughput: 0: 42219.1. Samples: 30633260. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 13:09:58,841][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:09:59,084][00497] Updated weights for policy 0, policy_version 9059 (0.0026) [2024-03-29 13:10:03,020][00497] Updated weights for policy 0, policy_version 9069 (0.0020) [2024-03-29 13:10:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 148635648. Throughput: 0: 42554.7. Samples: 30782620. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:03,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:10:06,463][00497] Updated weights for policy 0, policy_version 9079 (0.0019) [2024-03-29 13:10:08,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.5, 300 sec: 42098.5). Total num frames: 148848640. Throughput: 0: 42680.5. Samples: 31024160. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:08,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:10:10,603][00497] Updated weights for policy 0, policy_version 9089 (0.0023) [2024-03-29 13:10:12,465][00476] Signal inference workers to stop experience collection... (1100 times) [2024-03-29 13:10:12,506][00497] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-03-29 13:10:12,687][00476] Signal inference workers to resume experience collection... (1100 times) [2024-03-29 13:10:12,688][00497] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-03-29 13:10:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 149061632. Throughput: 0: 42195.1. Samples: 31271200. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:10:13,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:10:14,751][00497] Updated weights for policy 0, policy_version 9099 (0.0030) [2024-03-29 13:10:18,420][00497] Updated weights for policy 0, policy_version 9109 (0.0024) [2024-03-29 13:10:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 149258240. Throughput: 0: 42190.2. Samples: 31415160. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:10:18,841][00126] Avg episode reward: [(0, '0.220')] [2024-03-29 13:10:22,094][00497] Updated weights for policy 0, policy_version 9119 (0.0020) [2024-03-29 13:10:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 149504000. Throughput: 0: 42459.0. Samples: 31655320. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:10:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:10:26,326][00497] Updated weights for policy 0, policy_version 9129 (0.0025) [2024-03-29 13:10:28,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 149684224. Throughput: 0: 42170.2. Samples: 31904800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 13:10:28,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:10:30,580][00497] Updated weights for policy 0, policy_version 9139 (0.0020) [2024-03-29 13:10:33,839][00126] Fps is (10 sec: 36044.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 149864448. Throughput: 0: 41783.1. Samples: 32035700. Policy #0 lag: (min: 0.0, avg: 17.5, max: 41.0) [2024-03-29 13:10:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:10:34,167][00497] Updated weights for policy 0, policy_version 9149 (0.0024) [2024-03-29 13:10:37,615][00497] Updated weights for policy 0, policy_version 9159 (0.0024) [2024-03-29 13:10:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 150110208. Throughput: 0: 42391.6. Samples: 32283460. Policy #0 lag: (min: 0.0, avg: 17.5, max: 41.0) [2024-03-29 13:10:38,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:10:42,057][00497] Updated weights for policy 0, policy_version 9169 (0.0023) [2024-03-29 13:10:42,741][00476] Signal inference workers to stop experience collection... (1150 times) [2024-03-29 13:10:42,775][00497] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-03-29 13:10:42,929][00476] Signal inference workers to resume experience collection... (1150 times) [2024-03-29 13:10:42,930][00497] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-03-29 13:10:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 150306816. Throughput: 0: 42151.0. Samples: 32530060. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:10:43,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:10:46,005][00497] Updated weights for policy 0, policy_version 9179 (0.0034) [2024-03-29 13:10:48,839][00126] Fps is (10 sec: 36044.6, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 150470656. Throughput: 0: 41692.3. Samples: 32658780. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:48,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:10:50,066][00497] Updated weights for policy 0, policy_version 9189 (0.0019) [2024-03-29 13:10:53,338][00497] Updated weights for policy 0, policy_version 9199 (0.0022) [2024-03-29 13:10:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 150749184. Throughput: 0: 41898.7. Samples: 32909600. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 13:10:53,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 13:10:57,681][00497] Updated weights for policy 0, policy_version 9209 (0.0019) [2024-03-29 13:10:58,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 150913024. Throughput: 0: 41831.0. Samples: 33153600. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 13:10:58,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 13:11:01,383][00497] Updated weights for policy 0, policy_version 9219 (0.0021) [2024-03-29 13:11:03,839][00126] Fps is (10 sec: 36044.8, 60 sec: 41233.0, 300 sec: 42098.5). Total num frames: 151109632. Throughput: 0: 41387.7. Samples: 33277600. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 13:11:03,840][00126] Avg episode reward: [(0, '0.316')] [2024-03-29 13:11:04,158][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009224_151126016.pth... [2024-03-29 13:11:04,469][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008610_141066240.pth [2024-03-29 13:11:05,757][00497] Updated weights for policy 0, policy_version 9229 (0.0024) [2024-03-29 13:11:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 151339008. Throughput: 0: 41828.5. Samples: 33537600. Policy #0 lag: (min: 0.0, avg: 17.8, max: 42.0) [2024-03-29 13:11:08,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:11:09,197][00497] Updated weights for policy 0, policy_version 9239 (0.0033) [2024-03-29 13:11:13,321][00476] Signal inference workers to stop experience collection... (1200 times) [2024-03-29 13:11:13,353][00497] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-03-29 13:11:13,501][00476] Signal inference workers to resume experience collection... (1200 times) [2024-03-29 13:11:13,502][00497] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-03-29 13:11:13,505][00497] Updated weights for policy 0, policy_version 9249 (0.0028) [2024-03-29 13:11:13,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 151552000. Throughput: 0: 41534.6. Samples: 33773860. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:11:13,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:11:17,351][00497] Updated weights for policy 0, policy_version 9259 (0.0018) [2024-03-29 13:11:18,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.2, 300 sec: 41987.5). Total num frames: 151732224. Throughput: 0: 41225.0. Samples: 33890820. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:11:18,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:11:21,621][00497] Updated weights for policy 0, policy_version 9269 (0.0022) [2024-03-29 13:11:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 151977984. Throughput: 0: 41750.2. Samples: 34162220. Policy #0 lag: (min: 1.0, avg: 18.0, max: 41.0) [2024-03-29 13:11:23,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 13:11:24,783][00497] Updated weights for policy 0, policy_version 9279 (0.0019) [2024-03-29 13:11:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.1, 300 sec: 41932.0). Total num frames: 152174592. Throughput: 0: 41551.7. Samples: 34399880. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:11:28,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 13:11:29,145][00497] Updated weights for policy 0, policy_version 9289 (0.0027) [2024-03-29 13:11:32,957][00497] Updated weights for policy 0, policy_version 9299 (0.0019) [2024-03-29 13:11:33,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 152354816. Throughput: 0: 41385.0. Samples: 34521100. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:11:33,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:11:37,265][00497] Updated weights for policy 0, policy_version 9309 (0.0025) [2024-03-29 13:11:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 152600576. Throughput: 0: 41954.6. Samples: 34797560. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 13:11:38,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:11:40,378][00497] Updated weights for policy 0, policy_version 9319 (0.0026) [2024-03-29 13:11:43,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 152829952. Throughput: 0: 41730.3. Samples: 35031460. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:11:43,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:11:44,776][00497] Updated weights for policy 0, policy_version 9329 (0.0023) [2024-03-29 13:11:44,777][00476] Signal inference workers to stop experience collection... (1250 times) [2024-03-29 13:11:44,778][00476] Signal inference workers to resume experience collection... (1250 times) [2024-03-29 13:11:44,820][00497] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-03-29 13:11:44,821][00497] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-03-29 13:11:48,565][00497] Updated weights for policy 0, policy_version 9339 (0.0029) [2024-03-29 13:11:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 153010176. Throughput: 0: 41859.1. Samples: 35161260. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:11:48,840][00126] Avg episode reward: [(0, '0.255')] [2024-03-29 13:11:52,919][00497] Updated weights for policy 0, policy_version 9349 (0.0028) [2024-03-29 13:11:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40960.0, 300 sec: 41987.5). Total num frames: 153206784. Throughput: 0: 42081.8. Samples: 35431280. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:11:53,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:11:56,036][00497] Updated weights for policy 0, policy_version 9359 (0.0028) [2024-03-29 13:11:58,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 153452544. Throughput: 0: 41798.3. Samples: 35654780. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:11:58,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:12:00,409][00497] Updated weights for policy 0, policy_version 9369 (0.0019) [2024-03-29 13:12:03,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 153649152. Throughput: 0: 42356.3. Samples: 35796860. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:12:03,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:12:03,959][00497] Updated weights for policy 0, policy_version 9379 (0.0017) [2024-03-29 13:12:08,362][00497] Updated weights for policy 0, policy_version 9389 (0.0024) [2024-03-29 13:12:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 153845760. Throughput: 0: 42140.0. Samples: 36058520. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 13:12:08,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:12:11,603][00497] Updated weights for policy 0, policy_version 9399 (0.0027) [2024-03-29 13:12:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 154091520. Throughput: 0: 41839.5. Samples: 36282660. Policy #0 lag: (min: 0.0, avg: 19.1, max: 42.0) [2024-03-29 13:12:13,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:12:16,143][00497] Updated weights for policy 0, policy_version 9409 (0.0019) [2024-03-29 13:12:16,730][00476] Signal inference workers to stop experience collection... (1300 times) [2024-03-29 13:12:16,749][00497] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-03-29 13:12:16,939][00476] Signal inference workers to resume experience collection... (1300 times) [2024-03-29 13:12:16,940][00497] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-03-29 13:12:18,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 154288128. Throughput: 0: 42477.7. Samples: 36432600. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:12:18,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:12:19,662][00497] Updated weights for policy 0, policy_version 9419 (0.0023) [2024-03-29 13:12:23,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 154468352. Throughput: 0: 41990.7. Samples: 36687140. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 13:12:23,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 13:12:24,048][00497] Updated weights for policy 0, policy_version 9429 (0.0018) [2024-03-29 13:12:27,022][00497] Updated weights for policy 0, policy_version 9439 (0.0023) [2024-03-29 13:12:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 154730496. Throughput: 0: 41896.0. Samples: 36916780. Policy #0 lag: (min: 1.0, avg: 19.4, max: 42.0) [2024-03-29 13:12:28,840][00126] Avg episode reward: [(0, '0.270')] [2024-03-29 13:12:31,544][00497] Updated weights for policy 0, policy_version 9449 (0.0017) [2024-03-29 13:12:33,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42871.3, 300 sec: 42098.5). Total num frames: 154927104. Throughput: 0: 42132.4. Samples: 37057220. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:12:33,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 13:12:35,491][00497] Updated weights for policy 0, policy_version 9459 (0.0024) [2024-03-29 13:12:38,839][00126] Fps is (10 sec: 37682.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 155107328. Throughput: 0: 42123.8. Samples: 37326860. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:12:38,842][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:12:39,643][00497] Updated weights for policy 0, policy_version 9469 (0.0023) [2024-03-29 13:12:42,557][00497] Updated weights for policy 0, policy_version 9479 (0.0022) [2024-03-29 13:12:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 155353088. Throughput: 0: 42089.3. Samples: 37548800. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 13:12:43,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 13:12:47,020][00497] Updated weights for policy 0, policy_version 9489 (0.0022) [2024-03-29 13:12:47,588][00476] Signal inference workers to stop experience collection... (1350 times) [2024-03-29 13:12:47,697][00497] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-03-29 13:12:47,835][00476] Signal inference workers to resume experience collection... (1350 times) [2024-03-29 13:12:47,835][00497] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-03-29 13:12:48,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 155549696. Throughput: 0: 41907.3. Samples: 37682680. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 13:12:48,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:12:51,223][00497] Updated weights for policy 0, policy_version 9499 (0.0023) [2024-03-29 13:12:53,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 155729920. Throughput: 0: 41924.4. Samples: 37945120. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 13:12:53,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:12:55,433][00497] Updated weights for policy 0, policy_version 9509 (0.0022) [2024-03-29 13:12:58,406][00497] Updated weights for policy 0, policy_version 9519 (0.0031) [2024-03-29 13:12:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 155975680. Throughput: 0: 42248.9. Samples: 38183860. Policy #0 lag: (min: 1.0, avg: 18.7, max: 41.0) [2024-03-29 13:12:58,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:13:02,774][00497] Updated weights for policy 0, policy_version 9529 (0.0022) [2024-03-29 13:13:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 156172288. Throughput: 0: 41731.5. Samples: 38310520. Policy #0 lag: (min: 1.0, avg: 20.5, max: 40.0) [2024-03-29 13:13:03,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:13:04,307][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009534_156205056.pth... [2024-03-29 13:13:04,641][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000008917_146096128.pth [2024-03-29 13:13:06,826][00497] Updated weights for policy 0, policy_version 9539 (0.0027) [2024-03-29 13:13:08,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 156352512. Throughput: 0: 41741.8. Samples: 38565520. Policy #0 lag: (min: 1.0, avg: 20.5, max: 40.0) [2024-03-29 13:13:08,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:13:11,044][00497] Updated weights for policy 0, policy_version 9549 (0.0035) [2024-03-29 13:13:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 156598272. Throughput: 0: 42219.1. Samples: 38816640. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:13,841][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:13:14,076][00497] Updated weights for policy 0, policy_version 9559 (0.0029) [2024-03-29 13:13:18,577][00497] Updated weights for policy 0, policy_version 9569 (0.0024) [2024-03-29 13:13:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 156778496. Throughput: 0: 41599.1. Samples: 38929180. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:18,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:13:22,418][00497] Updated weights for policy 0, policy_version 9579 (0.0022) [2024-03-29 13:13:23,325][00476] Signal inference workers to stop experience collection... (1400 times) [2024-03-29 13:13:23,363][00497] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-03-29 13:13:23,518][00476] Signal inference workers to resume experience collection... (1400 times) [2024-03-29 13:13:23,518][00497] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-03-29 13:13:23,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 156975104. Throughput: 0: 41420.6. Samples: 39190780. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 13:13:23,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:13:26,624][00497] Updated weights for policy 0, policy_version 9589 (0.0026) [2024-03-29 13:13:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 157220864. Throughput: 0: 42320.0. Samples: 39453200. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:28,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:13:29,811][00497] Updated weights for policy 0, policy_version 9599 (0.0030) [2024-03-29 13:13:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 157417472. Throughput: 0: 41643.5. Samples: 39556640. Policy #0 lag: (min: 0.0, avg: 19.3, max: 42.0) [2024-03-29 13:13:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 13:13:33,858][00476] Saving new best policy, reward=0.486! [2024-03-29 13:13:34,458][00497] Updated weights for policy 0, policy_version 9609 (0.0024) [2024-03-29 13:13:38,334][00497] Updated weights for policy 0, policy_version 9619 (0.0034) [2024-03-29 13:13:38,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 157597696. Throughput: 0: 41441.0. Samples: 39809960. Policy #0 lag: (min: 2.0, avg: 20.4, max: 41.0) [2024-03-29 13:13:38,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:13:42,478][00497] Updated weights for policy 0, policy_version 9629 (0.0031) [2024-03-29 13:13:43,839][00126] Fps is (10 sec: 39322.0, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 157810688. Throughput: 0: 42030.7. Samples: 40075240. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 13:13:43,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:13:45,726][00497] Updated weights for policy 0, policy_version 9639 (0.0027) [2024-03-29 13:13:48,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 158056448. Throughput: 0: 41466.7. Samples: 40176520. Policy #0 lag: (min: 0.0, avg: 18.4, max: 41.0) [2024-03-29 13:13:48,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:13:50,040][00497] Updated weights for policy 0, policy_version 9649 (0.0028) [2024-03-29 13:13:53,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 158236672. Throughput: 0: 41228.4. Samples: 40420800. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:13:53,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 13:13:54,297][00497] Updated weights for policy 0, policy_version 9659 (0.0022) [2024-03-29 13:13:55,168][00476] Signal inference workers to stop experience collection... (1450 times) [2024-03-29 13:13:55,211][00497] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-03-29 13:13:55,247][00476] Signal inference workers to resume experience collection... (1450 times) [2024-03-29 13:13:55,254][00497] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-03-29 13:13:58,355][00497] Updated weights for policy 0, policy_version 9669 (0.0030) [2024-03-29 13:13:58,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40686.9, 300 sec: 41820.9). Total num frames: 158416896. Throughput: 0: 41772.5. Samples: 40696400. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:13:58,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 13:14:01,311][00497] Updated weights for policy 0, policy_version 9679 (0.0024) [2024-03-29 13:14:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 158679040. Throughput: 0: 41756.4. Samples: 40808220. Policy #0 lag: (min: 1.0, avg: 18.3, max: 41.0) [2024-03-29 13:14:03,840][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 13:14:05,450][00497] Updated weights for policy 0, policy_version 9689 (0.0023) [2024-03-29 13:14:08,839][00126] Fps is (10 sec: 47513.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 158892032. Throughput: 0: 41691.0. Samples: 41066880. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:14:08,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:14:09,672][00497] Updated weights for policy 0, policy_version 9699 (0.0024) [2024-03-29 13:14:13,660][00497] Updated weights for policy 0, policy_version 9709 (0.0034) [2024-03-29 13:14:13,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 159072256. Throughput: 0: 41689.8. Samples: 41329240. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:14:13,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 13:14:16,836][00497] Updated weights for policy 0, policy_version 9719 (0.0022) [2024-03-29 13:14:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 159301632. Throughput: 0: 42028.0. Samples: 41447900. Policy #0 lag: (min: 1.0, avg: 19.8, max: 44.0) [2024-03-29 13:14:18,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:14:21,106][00497] Updated weights for policy 0, policy_version 9729 (0.0025) [2024-03-29 13:14:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 159514624. Throughput: 0: 42024.0. Samples: 41701040. Policy #0 lag: (min: 1.0, avg: 22.9, max: 43.0) [2024-03-29 13:14:23,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:14:25,219][00497] Updated weights for policy 0, policy_version 9739 (0.0031) [2024-03-29 13:14:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 159711232. Throughput: 0: 41850.1. Samples: 41958500. Policy #0 lag: (min: 1.0, avg: 22.9, max: 43.0) [2024-03-29 13:14:28,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:14:28,996][00497] Updated weights for policy 0, policy_version 9749 (0.0024) [2024-03-29 13:14:30,058][00476] Signal inference workers to stop experience collection... (1500 times) [2024-03-29 13:14:30,092][00497] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-03-29 13:14:30,265][00476] Signal inference workers to resume experience collection... (1500 times) [2024-03-29 13:14:30,266][00497] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-03-29 13:14:32,213][00497] Updated weights for policy 0, policy_version 9759 (0.0024) [2024-03-29 13:14:33,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.2, 300 sec: 41987.4). Total num frames: 159956992. Throughput: 0: 42364.3. Samples: 42082920. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:14:33,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:14:36,670][00497] Updated weights for policy 0, policy_version 9769 (0.0017) [2024-03-29 13:14:38,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 160169984. Throughput: 0: 42528.9. Samples: 42334600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:14:38,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:14:41,020][00497] Updated weights for policy 0, policy_version 9779 (0.0020) [2024-03-29 13:14:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 160350208. Throughput: 0: 41927.0. Samples: 42583120. Policy #0 lag: (min: 0.0, avg: 22.2, max: 44.0) [2024-03-29 13:14:43,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:14:44,689][00497] Updated weights for policy 0, policy_version 9789 (0.0021) [2024-03-29 13:14:47,976][00497] Updated weights for policy 0, policy_version 9799 (0.0026) [2024-03-29 13:14:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 160579584. Throughput: 0: 42462.4. Samples: 42719020. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 13:14:48,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:14:52,129][00497] Updated weights for policy 0, policy_version 9809 (0.0019) [2024-03-29 13:14:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 160792576. Throughput: 0: 42185.4. Samples: 42965220. Policy #0 lag: (min: 2.0, avg: 21.0, max: 41.0) [2024-03-29 13:14:53,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:14:56,707][00497] Updated weights for policy 0, policy_version 9819 (0.0019) [2024-03-29 13:14:58,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 160972800. Throughput: 0: 42019.9. Samples: 43220140. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:14:58,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:15:00,288][00497] Updated weights for policy 0, policy_version 9829 (0.0022) [2024-03-29 13:15:02,365][00476] Signal inference workers to stop experience collection... (1550 times) [2024-03-29 13:15:02,408][00497] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-03-29 13:15:02,579][00476] Signal inference workers to resume experience collection... (1550 times) [2024-03-29 13:15:02,580][00497] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-03-29 13:15:03,717][00497] Updated weights for policy 0, policy_version 9839 (0.0022) [2024-03-29 13:15:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 161202176. Throughput: 0: 42381.8. Samples: 43355080. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 13:15:03,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 13:15:04,012][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009840_161218560.pth... [2024-03-29 13:15:04,340][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009224_151126016.pth [2024-03-29 13:15:08,058][00497] Updated weights for policy 0, policy_version 9849 (0.0023) [2024-03-29 13:15:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 161398784. Throughput: 0: 41712.4. Samples: 43578100. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 13:15:08,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 13:15:12,370][00497] Updated weights for policy 0, policy_version 9859 (0.0023) [2024-03-29 13:15:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 161579008. Throughput: 0: 41828.1. Samples: 43840760. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:15:13,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:15:16,197][00497] Updated weights for policy 0, policy_version 9869 (0.0022) [2024-03-29 13:15:18,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 161808384. Throughput: 0: 41803.2. Samples: 43964060. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 13:15:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:15:19,533][00497] Updated weights for policy 0, policy_version 9879 (0.0034) [2024-03-29 13:15:23,588][00497] Updated weights for policy 0, policy_version 9889 (0.0022) [2024-03-29 13:15:23,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 162021376. Throughput: 0: 41667.7. Samples: 44209640. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 13:15:23,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:15:27,837][00497] Updated weights for policy 0, policy_version 9899 (0.0023) [2024-03-29 13:15:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 162217984. Throughput: 0: 41834.0. Samples: 44465640. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:15:28,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:15:31,800][00497] Updated weights for policy 0, policy_version 9909 (0.0020) [2024-03-29 13:15:33,652][00476] Signal inference workers to stop experience collection... (1600 times) [2024-03-29 13:15:33,725][00497] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-03-29 13:15:33,727][00476] Signal inference workers to resume experience collection... (1600 times) [2024-03-29 13:15:33,754][00497] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-03-29 13:15:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 162447360. Throughput: 0: 41625.7. Samples: 44592180. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:15:33,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:15:34,966][00497] Updated weights for policy 0, policy_version 9919 (0.0024) [2024-03-29 13:15:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 162660352. Throughput: 0: 41793.8. Samples: 44845940. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:15:38,841][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 13:15:38,972][00497] Updated weights for policy 0, policy_version 9929 (0.0024) [2024-03-29 13:15:43,281][00497] Updated weights for policy 0, policy_version 9939 (0.0019) [2024-03-29 13:15:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 162856960. Throughput: 0: 41920.0. Samples: 45106540. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:15:43,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:15:47,196][00497] Updated weights for policy 0, policy_version 9949 (0.0029) [2024-03-29 13:15:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 163069952. Throughput: 0: 41835.6. Samples: 45237680. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 13:15:48,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:15:50,431][00497] Updated weights for policy 0, policy_version 9959 (0.0019) [2024-03-29 13:15:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 163299328. Throughput: 0: 42204.8. Samples: 45477320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 13:15:53,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:15:54,608][00497] Updated weights for policy 0, policy_version 9969 (0.0018) [2024-03-29 13:15:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 163479552. Throughput: 0: 41950.7. Samples: 45728540. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 13:15:58,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:15:59,135][00497] Updated weights for policy 0, policy_version 9979 (0.0019) [2024-03-29 13:16:02,782][00497] Updated weights for policy 0, policy_version 9989 (0.0021) [2024-03-29 13:16:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 163692544. Throughput: 0: 42056.0. Samples: 45856580. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 13:16:03,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:16:04,262][00476] Signal inference workers to stop experience collection... (1650 times) [2024-03-29 13:16:04,263][00476] Signal inference workers to resume experience collection... (1650 times) [2024-03-29 13:16:04,303][00497] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-03-29 13:16:04,303][00497] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-03-29 13:16:06,094][00497] Updated weights for policy 0, policy_version 9999 (0.0036) [2024-03-29 13:16:08,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 163938304. Throughput: 0: 42159.2. Samples: 46106800. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 13:16:08,842][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:16:10,161][00497] Updated weights for policy 0, policy_version 10009 (0.0027) [2024-03-29 13:16:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 164134912. Throughput: 0: 42137.1. Samples: 46361820. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:16:13,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:16:14,352][00497] Updated weights for policy 0, policy_version 10019 (0.0021) [2024-03-29 13:16:18,179][00497] Updated weights for policy 0, policy_version 10029 (0.0019) [2024-03-29 13:16:18,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 164331520. Throughput: 0: 42456.8. Samples: 46502740. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 13:16:18,840][00126] Avg episode reward: [(0, '0.266')] [2024-03-29 13:16:21,615][00497] Updated weights for policy 0, policy_version 10039 (0.0028) [2024-03-29 13:16:23,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 164577280. Throughput: 0: 42247.0. Samples: 46747060. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 13:16:23,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:16:25,700][00497] Updated weights for policy 0, policy_version 10049 (0.0026) [2024-03-29 13:16:28,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 164773888. Throughput: 0: 41995.1. Samples: 46996320. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 13:16:28,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 13:16:29,779][00497] Updated weights for policy 0, policy_version 10059 (0.0024) [2024-03-29 13:16:33,442][00497] Updated weights for policy 0, policy_version 10069 (0.0028) [2024-03-29 13:16:33,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 164986880. Throughput: 0: 42162.7. Samples: 47135000. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 13:16:33,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 13:16:35,270][00476] Signal inference workers to stop experience collection... (1700 times) [2024-03-29 13:16:35,271][00476] Signal inference workers to resume experience collection... (1700 times) [2024-03-29 13:16:35,318][00497] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-03-29 13:16:35,318][00497] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-03-29 13:16:36,823][00497] Updated weights for policy 0, policy_version 10079 (0.0024) [2024-03-29 13:16:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 165216256. Throughput: 0: 42372.5. Samples: 47384080. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 13:16:38,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 13:16:41,051][00497] Updated weights for policy 0, policy_version 10089 (0.0024) [2024-03-29 13:16:43,842][00126] Fps is (10 sec: 44225.3, 60 sec: 42869.7, 300 sec: 42098.2). Total num frames: 165429248. Throughput: 0: 42304.7. Samples: 47632360. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 13:16:43,844][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 13:16:45,284][00497] Updated weights for policy 0, policy_version 10099 (0.0018) [2024-03-29 13:16:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 165609472. Throughput: 0: 42448.9. Samples: 47766780. Policy #0 lag: (min: 2.0, avg: 22.9, max: 43.0) [2024-03-29 13:16:48,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:16:48,954][00497] Updated weights for policy 0, policy_version 10109 (0.0020) [2024-03-29 13:16:52,152][00497] Updated weights for policy 0, policy_version 10119 (0.0022) [2024-03-29 13:16:53,839][00126] Fps is (10 sec: 42609.5, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 165855232. Throughput: 0: 42551.6. Samples: 48021620. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 13:16:53,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:16:56,430][00497] Updated weights for policy 0, policy_version 10129 (0.0022) [2024-03-29 13:16:58,839][00126] Fps is (10 sec: 47514.1, 60 sec: 43417.6, 300 sec: 42154.1). Total num frames: 166084608. Throughput: 0: 42675.3. Samples: 48282200. Policy #0 lag: (min: 1.0, avg: 21.8, max: 43.0) [2024-03-29 13:16:58,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:17:00,584][00497] Updated weights for policy 0, policy_version 10139 (0.0027) [2024-03-29 13:17:03,839][00126] Fps is (10 sec: 39320.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 166248448. Throughput: 0: 42429.8. Samples: 48412080. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 13:17:03,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:17:04,137][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010148_166264832.pth... [2024-03-29 13:17:04,436][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009534_156205056.pth [2024-03-29 13:17:04,708][00497] Updated weights for policy 0, policy_version 10149 (0.0018) [2024-03-29 13:17:06,653][00476] Signal inference workers to stop experience collection... (1750 times) [2024-03-29 13:17:06,689][00497] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-03-29 13:17:06,833][00476] Signal inference workers to resume experience collection... (1750 times) [2024-03-29 13:17:06,834][00497] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-03-29 13:17:07,769][00497] Updated weights for policy 0, policy_version 10159 (0.0025) [2024-03-29 13:17:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 166477824. Throughput: 0: 42480.1. Samples: 48658660. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:17:08,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:17:12,214][00497] Updated weights for policy 0, policy_version 10169 (0.0023) [2024-03-29 13:17:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 166690816. Throughput: 0: 42399.5. Samples: 48904300. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:17:13,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:17:16,200][00497] Updated weights for policy 0, policy_version 10179 (0.0020) [2024-03-29 13:17:18,839][00126] Fps is (10 sec: 39322.1, 60 sec: 42325.5, 300 sec: 42043.0). Total num frames: 166871040. Throughput: 0: 42148.5. Samples: 49031680. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:17:18,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:17:20,347][00497] Updated weights for policy 0, policy_version 10189 (0.0022) [2024-03-29 13:17:23,762][00497] Updated weights for policy 0, policy_version 10199 (0.0022) [2024-03-29 13:17:23,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 167100416. Throughput: 0: 42303.6. Samples: 49287740. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:17:23,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:17:27,745][00497] Updated weights for policy 0, policy_version 10209 (0.0021) [2024-03-29 13:17:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 167313408. Throughput: 0: 42330.8. Samples: 49537140. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 13:17:28,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 13:17:31,609][00497] Updated weights for policy 0, policy_version 10219 (0.0024) [2024-03-29 13:17:33,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 167493632. Throughput: 0: 42168.4. Samples: 49664360. Policy #0 lag: (min: 1.0, avg: 22.0, max: 43.0) [2024-03-29 13:17:33,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:17:35,762][00497] Updated weights for policy 0, policy_version 10229 (0.0024) [2024-03-29 13:17:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 167739392. Throughput: 0: 42150.2. Samples: 49918380. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:17:38,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:17:39,072][00497] Updated weights for policy 0, policy_version 10239 (0.0021) [2024-03-29 13:17:43,316][00476] Signal inference workers to stop experience collection... (1800 times) [2024-03-29 13:17:43,376][00497] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-03-29 13:17:43,408][00476] Signal inference workers to resume experience collection... (1800 times) [2024-03-29 13:17:43,412][00497] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-03-29 13:17:43,415][00497] Updated weights for policy 0, policy_version 10249 (0.0022) [2024-03-29 13:17:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41780.9, 300 sec: 41987.5). Total num frames: 167936000. Throughput: 0: 41889.2. Samples: 50167220. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 13:17:43,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:17:47,351][00497] Updated weights for policy 0, policy_version 10259 (0.0029) [2024-03-29 13:17:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 168132608. Throughput: 0: 41534.4. Samples: 50281120. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 13:17:48,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:17:51,355][00497] Updated weights for policy 0, policy_version 10269 (0.0023) [2024-03-29 13:17:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 168361984. Throughput: 0: 41937.4. Samples: 50545840. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:17:53,840][00126] Avg episode reward: [(0, '0.285')] [2024-03-29 13:17:54,650][00497] Updated weights for policy 0, policy_version 10279 (0.0023) [2024-03-29 13:17:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 168542208. Throughput: 0: 41932.6. Samples: 50791260. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:17:58,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:17:59,108][00497] Updated weights for policy 0, policy_version 10289 (0.0021) [2024-03-29 13:18:02,710][00497] Updated weights for policy 0, policy_version 10299 (0.0028) [2024-03-29 13:18:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 168771584. Throughput: 0: 41824.8. Samples: 50913800. Policy #0 lag: (min: 2.0, avg: 20.5, max: 42.0) [2024-03-29 13:18:03,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 13:18:07,226][00497] Updated weights for policy 0, policy_version 10309 (0.0025) [2024-03-29 13:18:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 168968192. Throughput: 0: 41964.5. Samples: 51176140. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 13:18:08,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:18:10,425][00497] Updated weights for policy 0, policy_version 10319 (0.0024) [2024-03-29 13:18:11,863][00476] Signal inference workers to stop experience collection... (1850 times) [2024-03-29 13:18:11,883][00497] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-03-29 13:18:12,073][00476] Signal inference workers to resume experience collection... (1850 times) [2024-03-29 13:18:12,074][00497] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-03-29 13:18:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 169181184. Throughput: 0: 41539.9. Samples: 51406440. Policy #0 lag: (min: 1.0, avg: 19.7, max: 43.0) [2024-03-29 13:18:13,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:18:14,961][00497] Updated weights for policy 0, policy_version 10329 (0.0021) [2024-03-29 13:18:18,374][00497] Updated weights for policy 0, policy_version 10339 (0.0021) [2024-03-29 13:18:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 169410560. Throughput: 0: 41543.7. Samples: 51533820. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:18,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:18:23,002][00497] Updated weights for policy 0, policy_version 10349 (0.0023) [2024-03-29 13:18:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 169590784. Throughput: 0: 41626.2. Samples: 51791560. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:23,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:18:26,424][00497] Updated weights for policy 0, policy_version 10359 (0.0029) [2024-03-29 13:18:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 169820160. Throughput: 0: 41526.4. Samples: 52035900. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 13:18:28,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:18:30,591][00497] Updated weights for policy 0, policy_version 10369 (0.0019) [2024-03-29 13:18:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 170033152. Throughput: 0: 41997.8. Samples: 52171020. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:33,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:18:33,999][00497] Updated weights for policy 0, policy_version 10379 (0.0024) [2024-03-29 13:18:38,295][00497] Updated weights for policy 0, policy_version 10389 (0.0019) [2024-03-29 13:18:38,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 42098.5). Total num frames: 170229760. Throughput: 0: 42046.7. Samples: 52437940. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:18:38,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:18:41,820][00497] Updated weights for policy 0, policy_version 10399 (0.0018) [2024-03-29 13:18:43,840][00126] Fps is (10 sec: 44232.3, 60 sec: 42324.7, 300 sec: 42098.4). Total num frames: 170475520. Throughput: 0: 41941.7. Samples: 52678680. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:18:43,841][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:18:45,721][00476] Signal inference workers to stop experience collection... (1900 times) [2024-03-29 13:18:45,743][00497] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-03-29 13:18:45,943][00476] Signal inference workers to resume experience collection... (1900 times) [2024-03-29 13:18:45,943][00497] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-03-29 13:18:45,947][00497] Updated weights for policy 0, policy_version 10409 (0.0023) [2024-03-29 13:18:48,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 170672128. Throughput: 0: 42123.4. Samples: 52809360. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:18:48,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:18:49,275][00497] Updated weights for policy 0, policy_version 10419 (0.0021) [2024-03-29 13:18:53,839][00126] Fps is (10 sec: 37687.2, 60 sec: 41506.2, 300 sec: 42154.1). Total num frames: 170852352. Throughput: 0: 42275.5. Samples: 53078540. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:18:53,841][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:18:53,907][00497] Updated weights for policy 0, policy_version 10429 (0.0037) [2024-03-29 13:18:57,150][00497] Updated weights for policy 0, policy_version 10439 (0.0029) [2024-03-29 13:18:58,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 171098112. Throughput: 0: 42319.6. Samples: 53310820. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:18:58,840][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 13:19:01,459][00497] Updated weights for policy 0, policy_version 10449 (0.0026) [2024-03-29 13:19:03,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 171311104. Throughput: 0: 42607.9. Samples: 53451180. Policy #0 lag: (min: 2.0, avg: 22.3, max: 42.0) [2024-03-29 13:19:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 13:19:04,005][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010457_171327488.pth... [2024-03-29 13:19:04,335][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000009840_161218560.pth [2024-03-29 13:19:04,839][00497] Updated weights for policy 0, policy_version 10459 (0.0020) [2024-03-29 13:19:08,840][00126] Fps is (10 sec: 39320.7, 60 sec: 42052.1, 300 sec: 42098.5). Total num frames: 171491328. Throughput: 0: 42546.1. Samples: 53706140. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:19:08,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:19:09,483][00497] Updated weights for policy 0, policy_version 10469 (0.0024) [2024-03-29 13:19:12,990][00497] Updated weights for policy 0, policy_version 10479 (0.0024) [2024-03-29 13:19:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 171720704. Throughput: 0: 42271.1. Samples: 53938100. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:19:13,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:19:17,053][00476] Signal inference workers to stop experience collection... (1950 times) [2024-03-29 13:19:17,126][00497] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-03-29 13:19:17,144][00476] Signal inference workers to resume experience collection... (1950 times) [2024-03-29 13:19:17,157][00497] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-03-29 13:19:17,160][00497] Updated weights for policy 0, policy_version 10489 (0.0027) [2024-03-29 13:19:18,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 171933696. Throughput: 0: 42298.6. Samples: 54074460. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:19:18,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 13:19:20,313][00497] Updated weights for policy 0, policy_version 10499 (0.0027) [2024-03-29 13:19:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 172130304. Throughput: 0: 42024.4. Samples: 54329040. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:19:23,841][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:19:25,069][00497] Updated weights for policy 0, policy_version 10509 (0.0021) [2024-03-29 13:19:28,403][00497] Updated weights for policy 0, policy_version 10519 (0.0029) [2024-03-29 13:19:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 172359680. Throughput: 0: 42128.9. Samples: 54574440. Policy #0 lag: (min: 0.0, avg: 20.2, max: 44.0) [2024-03-29 13:19:28,840][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 13:19:32,635][00497] Updated weights for policy 0, policy_version 10529 (0.0021) [2024-03-29 13:19:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 172556288. Throughput: 0: 42081.9. Samples: 54703040. Policy #0 lag: (min: 0.0, avg: 19.6, max: 43.0) [2024-03-29 13:19:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:19:36,060][00497] Updated weights for policy 0, policy_version 10539 (0.0019) [2024-03-29 13:19:38,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 172736512. Throughput: 0: 41692.9. Samples: 54954720. Policy #0 lag: (min: 0.0, avg: 19.6, max: 43.0) [2024-03-29 13:19:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:19:40,816][00497] Updated weights for policy 0, policy_version 10549 (0.0018) [2024-03-29 13:19:43,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.8, 300 sec: 42043.0). Total num frames: 172982272. Throughput: 0: 41987.0. Samples: 55200240. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:19:43,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:19:44,021][00497] Updated weights for policy 0, policy_version 10559 (0.0021) [2024-03-29 13:19:48,035][00476] Signal inference workers to stop experience collection... (2000 times) [2024-03-29 13:19:48,108][00497] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-03-29 13:19:48,195][00476] Signal inference workers to resume experience collection... (2000 times) [2024-03-29 13:19:48,195][00497] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-03-29 13:19:48,199][00497] Updated weights for policy 0, policy_version 10569 (0.0021) [2024-03-29 13:19:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 173178880. Throughput: 0: 41723.1. Samples: 55328720. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 13:19:48,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:19:51,521][00497] Updated weights for policy 0, policy_version 10579 (0.0026) [2024-03-29 13:19:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.1, 300 sec: 42043.0). Total num frames: 173375488. Throughput: 0: 41569.0. Samples: 55576740. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:19:53,842][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:19:56,584][00497] Updated weights for policy 0, policy_version 10589 (0.0023) [2024-03-29 13:19:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 173621248. Throughput: 0: 42147.5. Samples: 55834740. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:19:58,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:19:59,619][00497] Updated weights for policy 0, policy_version 10599 (0.0019) [2024-03-29 13:20:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.0, 300 sec: 42043.0). Total num frames: 173801472. Throughput: 0: 41861.6. Samples: 55958240. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:20:03,841][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 13:20:04,029][00497] Updated weights for policy 0, policy_version 10609 (0.0022) [2024-03-29 13:20:07,144][00497] Updated weights for policy 0, policy_version 10619 (0.0027) [2024-03-29 13:20:08,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.4, 300 sec: 42098.6). Total num frames: 173998080. Throughput: 0: 41501.8. Samples: 56196620. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:20:08,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:20:12,431][00497] Updated weights for policy 0, policy_version 10629 (0.0033) [2024-03-29 13:20:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 174211072. Throughput: 0: 41786.7. Samples: 56454840. Policy #0 lag: (min: 0.0, avg: 20.5, max: 43.0) [2024-03-29 13:20:13,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:20:15,719][00497] Updated weights for policy 0, policy_version 10639 (0.0025) [2024-03-29 13:20:16,644][00476] Signal inference workers to stop experience collection... (2050 times) [2024-03-29 13:20:16,678][00497] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-03-29 13:20:16,826][00476] Signal inference workers to resume experience collection... (2050 times) [2024-03-29 13:20:16,826][00497] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-03-29 13:20:18,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 174424064. Throughput: 0: 41547.9. Samples: 56572700. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 13:20:18,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:20:20,026][00497] Updated weights for policy 0, policy_version 10649 (0.0018) [2024-03-29 13:20:23,194][00497] Updated weights for policy 0, policy_version 10659 (0.0017) [2024-03-29 13:20:23,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 174653440. Throughput: 0: 41555.4. Samples: 56824720. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 13:20:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:20:28,144][00497] Updated weights for policy 0, policy_version 10669 (0.0024) [2024-03-29 13:20:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 174833664. Throughput: 0: 41805.0. Samples: 57081460. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 13:20:28,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:20:31,430][00497] Updated weights for policy 0, policy_version 10679 (0.0023) [2024-03-29 13:20:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 175063040. Throughput: 0: 41351.1. Samples: 57189520. Policy #0 lag: (min: 0.0, avg: 17.8, max: 41.0) [2024-03-29 13:20:33,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 13:20:35,652][00497] Updated weights for policy 0, policy_version 10689 (0.0019) [2024-03-29 13:20:38,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 175276032. Throughput: 0: 41740.5. Samples: 57455060. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:20:38,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:20:39,080][00497] Updated weights for policy 0, policy_version 10699 (0.0025) [2024-03-29 13:20:43,839][00126] Fps is (10 sec: 37682.9, 60 sec: 40960.0, 300 sec: 41931.9). Total num frames: 175439872. Throughput: 0: 41524.4. Samples: 57703340. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:20:43,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:20:43,871][00497] Updated weights for policy 0, policy_version 10709 (0.0024) [2024-03-29 13:20:47,195][00497] Updated weights for policy 0, policy_version 10719 (0.0020) [2024-03-29 13:20:48,391][00476] Signal inference workers to stop experience collection... (2100 times) [2024-03-29 13:20:48,416][00497] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-03-29 13:20:48,611][00476] Signal inference workers to resume experience collection... (2100 times) [2024-03-29 13:20:48,612][00497] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-03-29 13:20:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 175685632. Throughput: 0: 41250.3. Samples: 57814500. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 13:20:48,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:20:51,459][00497] Updated weights for policy 0, policy_version 10729 (0.0031) [2024-03-29 13:20:53,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 42098.5). Total num frames: 175898624. Throughput: 0: 41924.0. Samples: 58083200. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:20:53,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:20:54,611][00497] Updated weights for policy 0, policy_version 10739 (0.0022) [2024-03-29 13:20:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 42043.0). Total num frames: 176095232. Throughput: 0: 41726.2. Samples: 58332520. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:20:58,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:20:59,546][00497] Updated weights for policy 0, policy_version 10749 (0.0026) [2024-03-29 13:21:02,759][00497] Updated weights for policy 0, policy_version 10759 (0.0025) [2024-03-29 13:21:03,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 176308224. Throughput: 0: 41852.5. Samples: 58456060. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:21:03,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:21:03,925][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010762_176324608.pth... [2024-03-29 13:21:04,250][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010148_166264832.pth [2024-03-29 13:21:07,085][00497] Updated weights for policy 0, policy_version 10769 (0.0025) [2024-03-29 13:21:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 176521216. Throughput: 0: 42033.4. Samples: 58716220. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:21:08,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 13:21:10,315][00497] Updated weights for policy 0, policy_version 10779 (0.0026) [2024-03-29 13:21:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 176717824. Throughput: 0: 41772.1. Samples: 58961200. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:21:13,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 13:21:15,318][00497] Updated weights for policy 0, policy_version 10789 (0.0020) [2024-03-29 13:21:18,750][00497] Updated weights for policy 0, policy_version 10799 (0.0033) [2024-03-29 13:21:18,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 176930816. Throughput: 0: 42064.3. Samples: 59082420. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:21:18,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:21:19,786][00476] Signal inference workers to stop experience collection... (2150 times) [2024-03-29 13:21:19,807][00497] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-03-29 13:21:19,959][00476] Signal inference workers to resume experience collection... (2150 times) [2024-03-29 13:21:19,960][00497] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-03-29 13:21:23,375][00497] Updated weights for policy 0, policy_version 10809 (0.0023) [2024-03-29 13:21:23,839][00126] Fps is (10 sec: 39321.0, 60 sec: 40960.0, 300 sec: 41820.8). Total num frames: 177111040. Throughput: 0: 41432.4. Samples: 59319520. Policy #0 lag: (min: 0.0, avg: 21.4, max: 41.0) [2024-03-29 13:21:23,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:21:26,497][00497] Updated weights for policy 0, policy_version 10819 (0.0027) [2024-03-29 13:21:28,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 177324032. Throughput: 0: 41455.6. Samples: 59568840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:21:28,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:21:30,931][00497] Updated weights for policy 0, policy_version 10829 (0.0027) [2024-03-29 13:21:33,839][00126] Fps is (10 sec: 45875.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 177569792. Throughput: 0: 42041.9. Samples: 59706380. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:21:33,841][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:21:34,147][00497] Updated weights for policy 0, policy_version 10839 (0.0024) [2024-03-29 13:21:38,784][00497] Updated weights for policy 0, policy_version 10849 (0.0026) [2024-03-29 13:21:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41765.7). Total num frames: 177750016. Throughput: 0: 41512.8. Samples: 59951280. Policy #0 lag: (min: 2.0, avg: 22.6, max: 43.0) [2024-03-29 13:21:38,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 13:21:42,163][00497] Updated weights for policy 0, policy_version 10859 (0.0029) [2024-03-29 13:21:43,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 177963008. Throughput: 0: 41502.6. Samples: 60200140. Policy #0 lag: (min: 2.0, avg: 22.6, max: 43.0) [2024-03-29 13:21:43,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:21:46,622][00497] Updated weights for policy 0, policy_version 10869 (0.0019) [2024-03-29 13:21:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41779.3, 300 sec: 41820.8). Total num frames: 178192384. Throughput: 0: 41791.6. Samples: 60336680. Policy #0 lag: (min: 2.0, avg: 19.9, max: 42.0) [2024-03-29 13:21:48,840][00126] Avg episode reward: [(0, '0.282')] [2024-03-29 13:21:49,962][00497] Updated weights for policy 0, policy_version 10879 (0.0025) [2024-03-29 13:21:50,587][00476] Signal inference workers to stop experience collection... (2200 times) [2024-03-29 13:21:50,625][00497] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-03-29 13:21:50,807][00476] Signal inference workers to resume experience collection... (2200 times) [2024-03-29 13:21:50,807][00497] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-03-29 13:21:53,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 178372608. Throughput: 0: 41249.2. Samples: 60572440. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:21:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:21:54,547][00497] Updated weights for policy 0, policy_version 10889 (0.0017) [2024-03-29 13:21:57,552][00497] Updated weights for policy 0, policy_version 10899 (0.0025) [2024-03-29 13:21:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 178601984. Throughput: 0: 41369.2. Samples: 60822820. Policy #0 lag: (min: 0.0, avg: 19.6, max: 41.0) [2024-03-29 13:21:58,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:22:02,116][00497] Updated weights for policy 0, policy_version 10909 (0.0025) [2024-03-29 13:22:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 178814976. Throughput: 0: 41949.9. Samples: 60970160. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:22:03,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:22:05,201][00497] Updated weights for policy 0, policy_version 10919 (0.0021) [2024-03-29 13:22:08,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 179011584. Throughput: 0: 41921.0. Samples: 61205960. Policy #0 lag: (min: 1.0, avg: 19.3, max: 41.0) [2024-03-29 13:22:08,841][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:22:09,859][00497] Updated weights for policy 0, policy_version 10929 (0.0028) [2024-03-29 13:22:12,928][00497] Updated weights for policy 0, policy_version 10939 (0.0021) [2024-03-29 13:22:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 179240960. Throughput: 0: 41991.5. Samples: 61458460. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:22:13,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 13:22:17,484][00497] Updated weights for policy 0, policy_version 10949 (0.0017) [2024-03-29 13:22:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 179453952. Throughput: 0: 42253.8. Samples: 61607800. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:22:18,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:22:20,708][00497] Updated weights for policy 0, policy_version 10959 (0.0017) [2024-03-29 13:22:23,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 179650560. Throughput: 0: 42301.8. Samples: 61854860. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 13:22:23,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 13:22:25,277][00476] Signal inference workers to stop experience collection... (2250 times) [2024-03-29 13:22:25,322][00497] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-03-29 13:22:25,352][00476] Signal inference workers to resume experience collection... (2250 times) [2024-03-29 13:22:25,356][00497] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-03-29 13:22:25,363][00497] Updated weights for policy 0, policy_version 10969 (0.0024) [2024-03-29 13:22:28,676][00497] Updated weights for policy 0, policy_version 10979 (0.0022) [2024-03-29 13:22:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 179879936. Throughput: 0: 42129.5. Samples: 62095960. Policy #0 lag: (min: 1.0, avg: 21.4, max: 41.0) [2024-03-29 13:22:28,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:22:33,031][00497] Updated weights for policy 0, policy_version 10989 (0.0018) [2024-03-29 13:22:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 180076544. Throughput: 0: 42311.2. Samples: 62240680. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 13:22:33,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:22:36,323][00497] Updated weights for policy 0, policy_version 10999 (0.0022) [2024-03-29 13:22:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 180305920. Throughput: 0: 42471.6. Samples: 62483660. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:22:38,841][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:22:41,001][00497] Updated weights for policy 0, policy_version 11009 (0.0024) [2024-03-29 13:22:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 180502528. Throughput: 0: 42270.7. Samples: 62725000. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:22:43,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:22:44,494][00497] Updated weights for policy 0, policy_version 11019 (0.0024) [2024-03-29 13:22:48,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 180682752. Throughput: 0: 41932.0. Samples: 62857100. Policy #0 lag: (min: 0.0, avg: 18.7, max: 40.0) [2024-03-29 13:22:48,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 13:22:48,918][00497] Updated weights for policy 0, policy_version 11029 (0.0023) [2024-03-29 13:22:51,976][00497] Updated weights for policy 0, policy_version 11039 (0.0020) [2024-03-29 13:22:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 180944896. Throughput: 0: 42232.0. Samples: 63106400. Policy #0 lag: (min: 0.0, avg: 18.7, max: 40.0) [2024-03-29 13:22:53,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:22:56,348][00497] Updated weights for policy 0, policy_version 11049 (0.0019) [2024-03-29 13:22:56,881][00476] Signal inference workers to stop experience collection... (2300 times) [2024-03-29 13:22:56,939][00497] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-03-29 13:22:56,974][00476] Signal inference workers to resume experience collection... (2300 times) [2024-03-29 13:22:56,976][00497] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-03-29 13:22:58,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 181141504. Throughput: 0: 42344.6. Samples: 63363960. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 13:22:58,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:22:59,751][00497] Updated weights for policy 0, policy_version 11059 (0.0035) [2024-03-29 13:23:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 181338112. Throughput: 0: 42138.1. Samples: 63504020. Policy #0 lag: (min: 0.0, avg: 21.5, max: 43.0) [2024-03-29 13:23:03,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:23:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011068_181338112.pth... [2024-03-29 13:23:04,169][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010457_171327488.pth [2024-03-29 13:23:04,440][00497] Updated weights for policy 0, policy_version 11069 (0.0019) [2024-03-29 13:23:07,451][00497] Updated weights for policy 0, policy_version 11079 (0.0030) [2024-03-29 13:23:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42871.4, 300 sec: 42043.0). Total num frames: 181583872. Throughput: 0: 42052.8. Samples: 63747240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 13:23:08,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:23:11,688][00497] Updated weights for policy 0, policy_version 11089 (0.0024) [2024-03-29 13:23:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 181780480. Throughput: 0: 42432.8. Samples: 64005440. Policy #0 lag: (min: 2.0, avg: 21.6, max: 43.0) [2024-03-29 13:23:13,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 13:23:15,131][00497] Updated weights for policy 0, policy_version 11099 (0.0019) [2024-03-29 13:23:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 181977088. Throughput: 0: 41979.4. Samples: 64129760. Policy #0 lag: (min: 2.0, avg: 21.6, max: 43.0) [2024-03-29 13:23:18,841][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:23:19,806][00497] Updated weights for policy 0, policy_version 11109 (0.0028) [2024-03-29 13:23:23,023][00497] Updated weights for policy 0, policy_version 11119 (0.0026) [2024-03-29 13:23:23,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 182206464. Throughput: 0: 42123.0. Samples: 64379200. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:23:23,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:23:27,367][00497] Updated weights for policy 0, policy_version 11129 (0.0032) [2024-03-29 13:23:28,240][00476] Signal inference workers to stop experience collection... (2350 times) [2024-03-29 13:23:28,322][00497] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-03-29 13:23:28,326][00476] Signal inference workers to resume experience collection... (2350 times) [2024-03-29 13:23:28,348][00497] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-03-29 13:23:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 182403072. Throughput: 0: 42555.9. Samples: 64640020. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:23:28,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:23:30,886][00497] Updated weights for policy 0, policy_version 11139 (0.0022) [2024-03-29 13:23:33,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 182583296. Throughput: 0: 42199.6. Samples: 64756080. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:23:33,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:23:35,256][00497] Updated weights for policy 0, policy_version 11149 (0.0022) [2024-03-29 13:23:38,629][00497] Updated weights for policy 0, policy_version 11159 (0.0044) [2024-03-29 13:23:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 41876.5). Total num frames: 182829056. Throughput: 0: 42362.6. Samples: 65012720. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:23:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:23:43,107][00497] Updated weights for policy 0, policy_version 11169 (0.0024) [2024-03-29 13:23:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 183025664. Throughput: 0: 42227.5. Samples: 65264200. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 13:23:43,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 13:23:46,762][00497] Updated weights for policy 0, policy_version 11179 (0.0019) [2024-03-29 13:23:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 183222272. Throughput: 0: 41680.9. Samples: 65379660. Policy #0 lag: (min: 1.0, avg: 23.2, max: 42.0) [2024-03-29 13:23:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:23:51,119][00497] Updated weights for policy 0, policy_version 11189 (0.0027) [2024-03-29 13:23:53,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 183468032. Throughput: 0: 42180.5. Samples: 65645360. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:23:53,841][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 13:23:54,222][00497] Updated weights for policy 0, policy_version 11199 (0.0026) [2024-03-29 13:23:58,552][00497] Updated weights for policy 0, policy_version 11209 (0.0018) [2024-03-29 13:23:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 183664640. Throughput: 0: 42172.5. Samples: 65903200. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:23:58,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:23:59,493][00476] Signal inference workers to stop experience collection... (2400 times) [2024-03-29 13:23:59,494][00476] Signal inference workers to resume experience collection... (2400 times) [2024-03-29 13:23:59,530][00497] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-03-29 13:23:59,535][00497] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-03-29 13:24:02,083][00497] Updated weights for policy 0, policy_version 11219 (0.0034) [2024-03-29 13:24:03,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 183861248. Throughput: 0: 41998.2. Samples: 66019680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:24:03,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:24:06,711][00497] Updated weights for policy 0, policy_version 11229 (0.0028) [2024-03-29 13:24:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 184090624. Throughput: 0: 42462.3. Samples: 66290000. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:24:08,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:24:09,608][00497] Updated weights for policy 0, policy_version 11239 (0.0028) [2024-03-29 13:24:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 184287232. Throughput: 0: 42143.1. Samples: 66536460. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:24:13,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:24:13,922][00497] Updated weights for policy 0, policy_version 11249 (0.0022) [2024-03-29 13:24:17,328][00497] Updated weights for policy 0, policy_version 11259 (0.0022) [2024-03-29 13:24:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 184516608. Throughput: 0: 42389.3. Samples: 66663600. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:24:18,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:24:21,939][00497] Updated weights for policy 0, policy_version 11269 (0.0026) [2024-03-29 13:24:23,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42052.4, 300 sec: 41932.0). Total num frames: 184729600. Throughput: 0: 42667.7. Samples: 66932760. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:24:23,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:24:25,099][00497] Updated weights for policy 0, policy_version 11279 (0.0025) [2024-03-29 13:24:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 184926208. Throughput: 0: 42418.7. Samples: 67173040. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:24:28,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 13:24:29,085][00476] Signal inference workers to stop experience collection... (2450 times) [2024-03-29 13:24:29,110][00497] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-03-29 13:24:29,309][00476] Signal inference workers to resume experience collection... (2450 times) [2024-03-29 13:24:29,310][00497] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-03-29 13:24:29,313][00497] Updated weights for policy 0, policy_version 11289 (0.0019) [2024-03-29 13:24:32,808][00497] Updated weights for policy 0, policy_version 11299 (0.0020) [2024-03-29 13:24:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 185155584. Throughput: 0: 42549.8. Samples: 67294400. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:24:33,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:24:37,442][00497] Updated weights for policy 0, policy_version 11309 (0.0026) [2024-03-29 13:24:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 185352192. Throughput: 0: 42551.2. Samples: 67560160. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 13:24:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 13:24:40,739][00497] Updated weights for policy 0, policy_version 11319 (0.0022) [2024-03-29 13:24:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 185548800. Throughput: 0: 41867.0. Samples: 67787220. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 13:24:43,841][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:24:44,975][00497] Updated weights for policy 0, policy_version 11329 (0.0028) [2024-03-29 13:24:48,605][00497] Updated weights for policy 0, policy_version 11339 (0.0019) [2024-03-29 13:24:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 185778176. Throughput: 0: 42202.7. Samples: 67918800. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:24:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:24:53,072][00497] Updated weights for policy 0, policy_version 11349 (0.0017) [2024-03-29 13:24:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 185974784. Throughput: 0: 42205.3. Samples: 68189240. Policy #0 lag: (min: 0.0, avg: 19.5, max: 43.0) [2024-03-29 13:24:53,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 13:24:56,157][00497] Updated weights for policy 0, policy_version 11359 (0.0023) [2024-03-29 13:24:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.2, 300 sec: 42043.0). Total num frames: 186204160. Throughput: 0: 41819.9. Samples: 68418360. Policy #0 lag: (min: 0.0, avg: 19.5, max: 43.0) [2024-03-29 13:24:58,842][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 13:25:00,404][00497] Updated weights for policy 0, policy_version 11369 (0.0027) [2024-03-29 13:25:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 186417152. Throughput: 0: 42059.4. Samples: 68556280. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:25:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:25:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011378_186417152.pth... [2024-03-29 13:25:04,195][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000010762_176324608.pth [2024-03-29 13:25:04,460][00497] Updated weights for policy 0, policy_version 11379 (0.0023) [2024-03-29 13:25:06,393][00476] Signal inference workers to stop experience collection... (2500 times) [2024-03-29 13:25:06,473][00476] Signal inference workers to resume experience collection... (2500 times) [2024-03-29 13:25:06,474][00497] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-03-29 13:25:06,501][00497] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-03-29 13:25:08,839][00126] Fps is (10 sec: 37683.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 186580992. Throughput: 0: 41691.5. Samples: 68808880. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:25:08,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 13:25:08,991][00497] Updated weights for policy 0, policy_version 11389 (0.0029) [2024-03-29 13:25:11,963][00497] Updated weights for policy 0, policy_version 11399 (0.0033) [2024-03-29 13:25:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 186826752. Throughput: 0: 41206.2. Samples: 69027320. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:25:13,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:25:16,670][00497] Updated weights for policy 0, policy_version 11409 (0.0022) [2024-03-29 13:25:18,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 187006976. Throughput: 0: 41565.4. Samples: 69164840. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:25:18,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 13:25:20,224][00497] Updated weights for policy 0, policy_version 11419 (0.0029) [2024-03-29 13:25:23,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 187203584. Throughput: 0: 41415.2. Samples: 69423840. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:25:23,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 13:25:25,034][00497] Updated weights for policy 0, policy_version 11429 (0.0019) [2024-03-29 13:25:28,096][00497] Updated weights for policy 0, policy_version 11439 (0.0041) [2024-03-29 13:25:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 187449344. Throughput: 0: 41525.0. Samples: 69655840. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:25:28,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:25:32,598][00497] Updated weights for policy 0, policy_version 11449 (0.0020) [2024-03-29 13:25:33,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 187629568. Throughput: 0: 41355.2. Samples: 69779780. Policy #0 lag: (min: 0.0, avg: 21.6, max: 40.0) [2024-03-29 13:25:33,840][00126] Avg episode reward: [(0, '0.249')] [2024-03-29 13:25:36,126][00497] Updated weights for policy 0, policy_version 11459 (0.0027) [2024-03-29 13:25:38,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41233.0, 300 sec: 41987.5). Total num frames: 187826176. Throughput: 0: 41033.7. Samples: 70035760. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:25:38,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:25:40,658][00476] Signal inference workers to stop experience collection... (2550 times) [2024-03-29 13:25:40,701][00497] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-03-29 13:25:40,739][00476] Signal inference workers to resume experience collection... (2550 times) [2024-03-29 13:25:40,741][00497] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-03-29 13:25:40,744][00497] Updated weights for policy 0, policy_version 11469 (0.0024) [2024-03-29 13:25:43,834][00497] Updated weights for policy 0, policy_version 11479 (0.0018) [2024-03-29 13:25:43,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 188071936. Throughput: 0: 41313.9. Samples: 70277480. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:25:43,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:25:48,243][00497] Updated weights for policy 0, policy_version 11489 (0.0024) [2024-03-29 13:25:48,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 188252160. Throughput: 0: 40940.6. Samples: 70398600. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 13:25:48,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 13:25:51,997][00497] Updated weights for policy 0, policy_version 11499 (0.0029) [2024-03-29 13:25:53,839][00126] Fps is (10 sec: 37683.9, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 188448768. Throughput: 0: 41037.9. Samples: 70655580. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 13:25:53,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:25:56,593][00497] Updated weights for policy 0, policy_version 11509 (0.0029) [2024-03-29 13:25:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 188694528. Throughput: 0: 41596.4. Samples: 70899160. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:25:58,842][00126] Avg episode reward: [(0, '0.239')] [2024-03-29 13:25:59,768][00497] Updated weights for policy 0, policy_version 11519 (0.0027) [2024-03-29 13:26:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 188874752. Throughput: 0: 41232.9. Samples: 71020320. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:26:03,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:26:03,964][00497] Updated weights for policy 0, policy_version 11529 (0.0026) [2024-03-29 13:26:07,782][00497] Updated weights for policy 0, policy_version 11539 (0.0022) [2024-03-29 13:26:08,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 189087744. Throughput: 0: 41198.6. Samples: 71277780. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 13:26:08,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:26:12,463][00497] Updated weights for policy 0, policy_version 11549 (0.0027) [2024-03-29 13:26:12,667][00476] Signal inference workers to stop experience collection... (2600 times) [2024-03-29 13:26:12,716][00497] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-03-29 13:26:12,757][00476] Signal inference workers to resume experience collection... (2600 times) [2024-03-29 13:26:12,757][00497] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-03-29 13:26:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41932.0). Total num frames: 189300736. Throughput: 0: 41859.6. Samples: 71539520. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 13:26:13,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:26:15,634][00497] Updated weights for policy 0, policy_version 11559 (0.0025) [2024-03-29 13:26:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 189480960. Throughput: 0: 41089.0. Samples: 71628780. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:26:18,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 13:26:19,988][00497] Updated weights for policy 0, policy_version 11569 (0.0024) [2024-03-29 13:26:23,724][00497] Updated weights for policy 0, policy_version 11579 (0.0023) [2024-03-29 13:26:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 189710336. Throughput: 0: 41295.1. Samples: 71894040. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:26:23,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:26:28,201][00497] Updated weights for policy 0, policy_version 11589 (0.0022) [2024-03-29 13:26:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 189906944. Throughput: 0: 41980.2. Samples: 72166580. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 13:26:28,840][00126] Avg episode reward: [(0, '0.254')] [2024-03-29 13:26:31,245][00497] Updated weights for policy 0, policy_version 11599 (0.0022) [2024-03-29 13:26:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 190136320. Throughput: 0: 41327.0. Samples: 72258320. Policy #0 lag: (min: 0.0, avg: 19.9, max: 41.0) [2024-03-29 13:26:33,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 13:26:35,779][00497] Updated weights for policy 0, policy_version 11609 (0.0028) [2024-03-29 13:26:38,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 190332928. Throughput: 0: 41486.5. Samples: 72522480. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:26:38,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:26:39,312][00497] Updated weights for policy 0, policy_version 11619 (0.0024) [2024-03-29 13:26:43,839][00126] Fps is (10 sec: 36045.7, 60 sec: 40414.0, 300 sec: 41709.8). Total num frames: 190496768. Throughput: 0: 41476.1. Samples: 72765580. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:26:43,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:26:44,231][00497] Updated weights for policy 0, policy_version 11629 (0.0030) [2024-03-29 13:26:45,180][00476] Signal inference workers to stop experience collection... (2650 times) [2024-03-29 13:26:45,232][00497] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-03-29 13:26:45,267][00476] Signal inference workers to resume experience collection... (2650 times) [2024-03-29 13:26:45,273][00497] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-03-29 13:26:47,195][00497] Updated weights for policy 0, policy_version 11639 (0.0024) [2024-03-29 13:26:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 190742528. Throughput: 0: 41304.8. Samples: 72879040. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:26:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 13:26:51,662][00497] Updated weights for policy 0, policy_version 11649 (0.0020) [2024-03-29 13:26:53,839][00126] Fps is (10 sec: 45874.8, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 190955520. Throughput: 0: 41198.2. Samples: 73131700. Policy #0 lag: (min: 1.0, avg: 20.4, max: 40.0) [2024-03-29 13:26:53,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:26:55,595][00497] Updated weights for policy 0, policy_version 11659 (0.0017) [2024-03-29 13:26:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40686.9, 300 sec: 41765.3). Total num frames: 191135744. Throughput: 0: 41060.4. Samples: 73387240. Policy #0 lag: (min: 1.0, avg: 20.4, max: 40.0) [2024-03-29 13:26:58,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:27:00,283][00497] Updated weights for policy 0, policy_version 11669 (0.0024) [2024-03-29 13:27:03,172][00497] Updated weights for policy 0, policy_version 11679 (0.0019) [2024-03-29 13:27:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 191365120. Throughput: 0: 41531.5. Samples: 73497700. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:27:03,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:27:04,151][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011681_191381504.pth... [2024-03-29 13:27:04,462][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011068_181338112.pth [2024-03-29 13:27:07,674][00497] Updated weights for policy 0, policy_version 11689 (0.0035) [2024-03-29 13:27:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 191561728. Throughput: 0: 41108.6. Samples: 73743920. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:27:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:27:11,323][00497] Updated weights for policy 0, policy_version 11699 (0.0023) [2024-03-29 13:27:13,839][00126] Fps is (10 sec: 37683.5, 60 sec: 40687.0, 300 sec: 41654.2). Total num frames: 191741952. Throughput: 0: 40875.1. Samples: 74005960. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:27:13,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:27:15,911][00497] Updated weights for policy 0, policy_version 11709 (0.0022) [2024-03-29 13:27:17,313][00476] Signal inference workers to stop experience collection... (2700 times) [2024-03-29 13:27:17,313][00476] Signal inference workers to resume experience collection... (2700 times) [2024-03-29 13:27:17,353][00497] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-03-29 13:27:17,353][00497] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-03-29 13:27:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 191987712. Throughput: 0: 41723.6. Samples: 74135880. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:27:18,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:27:18,863][00497] Updated weights for policy 0, policy_version 11719 (0.0023) [2024-03-29 13:27:23,055][00497] Updated weights for policy 0, policy_version 11729 (0.0017) [2024-03-29 13:27:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 192184320. Throughput: 0: 41125.3. Samples: 74373120. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:27:23,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:27:26,913][00497] Updated weights for policy 0, policy_version 11739 (0.0023) [2024-03-29 13:27:28,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41232.9, 300 sec: 41709.7). Total num frames: 192380928. Throughput: 0: 41469.6. Samples: 74631720. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 13:27:28,840][00126] Avg episode reward: [(0, '0.283')] [2024-03-29 13:27:31,499][00497] Updated weights for policy 0, policy_version 11749 (0.0021) [2024-03-29 13:27:33,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 192610304. Throughput: 0: 41967.9. Samples: 74767600. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:27:33,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:27:34,452][00497] Updated weights for policy 0, policy_version 11759 (0.0020) [2024-03-29 13:27:38,632][00497] Updated weights for policy 0, policy_version 11769 (0.0026) [2024-03-29 13:27:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 192823296. Throughput: 0: 41760.8. Samples: 75010940. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:27:38,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:27:42,416][00497] Updated weights for policy 0, policy_version 11779 (0.0022) [2024-03-29 13:27:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 193019904. Throughput: 0: 41521.3. Samples: 75255700. Policy #0 lag: (min: 0.0, avg: 20.8, max: 43.0) [2024-03-29 13:27:43,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:27:47,081][00497] Updated weights for policy 0, policy_version 11789 (0.0024) [2024-03-29 13:27:48,277][00476] Signal inference workers to stop experience collection... (2750 times) [2024-03-29 13:27:48,322][00497] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-03-29 13:27:48,488][00476] Signal inference workers to resume experience collection... (2750 times) [2024-03-29 13:27:48,489][00497] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-03-29 13:27:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 193249280. Throughput: 0: 42128.9. Samples: 75393500. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:27:48,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:27:49,959][00497] Updated weights for policy 0, policy_version 11799 (0.0024) [2024-03-29 13:27:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 193462272. Throughput: 0: 42120.3. Samples: 75639340. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:27:53,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 13:27:54,029][00497] Updated weights for policy 0, policy_version 11809 (0.0023) [2024-03-29 13:27:58,025][00497] Updated weights for policy 0, policy_version 11819 (0.0024) [2024-03-29 13:27:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 193675264. Throughput: 0: 42046.1. Samples: 75898040. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:27:58,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:28:02,506][00497] Updated weights for policy 0, policy_version 11829 (0.0018) [2024-03-29 13:28:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 193871872. Throughput: 0: 42170.3. Samples: 76033540. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:28:03,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 13:28:05,352][00497] Updated weights for policy 0, policy_version 11839 (0.0036) [2024-03-29 13:28:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 194101248. Throughput: 0: 42291.6. Samples: 76276240. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:28:08,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 13:28:09,516][00497] Updated weights for policy 0, policy_version 11849 (0.0020) [2024-03-29 13:28:13,571][00497] Updated weights for policy 0, policy_version 11859 (0.0028) [2024-03-29 13:28:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42871.4, 300 sec: 41820.9). Total num frames: 194314240. Throughput: 0: 42234.3. Samples: 76532260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:28:13,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:28:18,163][00497] Updated weights for policy 0, policy_version 11869 (0.0028) [2024-03-29 13:28:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 194494464. Throughput: 0: 42115.2. Samples: 76662780. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:28:18,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 13:28:21,151][00497] Updated weights for policy 0, policy_version 11879 (0.0019) [2024-03-29 13:28:23,141][00476] Signal inference workers to stop experience collection... (2800 times) [2024-03-29 13:28:23,142][00476] Signal inference workers to resume experience collection... (2800 times) [2024-03-29 13:28:23,181][00497] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-03-29 13:28:23,182][00497] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-03-29 13:28:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 194723840. Throughput: 0: 41806.7. Samples: 76892240. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:28:23,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:28:25,420][00497] Updated weights for policy 0, policy_version 11889 (0.0019) [2024-03-29 13:28:28,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41820.8). Total num frames: 194920448. Throughput: 0: 42244.5. Samples: 77156700. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:28:28,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:28:29,107][00497] Updated weights for policy 0, policy_version 11899 (0.0018) [2024-03-29 13:28:33,552][00497] Updated weights for policy 0, policy_version 11909 (0.0022) [2024-03-29 13:28:33,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 195117056. Throughput: 0: 42148.8. Samples: 77290200. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 13:28:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:28:36,559][00497] Updated weights for policy 0, policy_version 11919 (0.0027) [2024-03-29 13:28:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 195362816. Throughput: 0: 42038.7. Samples: 77531080. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 13:28:38,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:28:40,843][00497] Updated weights for policy 0, policy_version 11929 (0.0023) [2024-03-29 13:28:43,839][00126] Fps is (10 sec: 45876.3, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 195575808. Throughput: 0: 42132.6. Samples: 77794000. Policy #0 lag: (min: 1.0, avg: 20.4, max: 43.0) [2024-03-29 13:28:43,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:28:44,654][00497] Updated weights for policy 0, policy_version 11939 (0.0039) [2024-03-29 13:28:48,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 195739648. Throughput: 0: 41901.2. Samples: 77919100. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:28:48,843][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:28:49,241][00497] Updated weights for policy 0, policy_version 11949 (0.0025) [2024-03-29 13:28:52,237][00497] Updated weights for policy 0, policy_version 11959 (0.0026) [2024-03-29 13:28:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 195985408. Throughput: 0: 41691.6. Samples: 78152360. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:28:53,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:28:56,392][00497] Updated weights for policy 0, policy_version 11969 (0.0022) [2024-03-29 13:28:58,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 196198400. Throughput: 0: 42104.0. Samples: 78426940. Policy #0 lag: (min: 0.0, avg: 21.7, max: 41.0) [2024-03-29 13:28:58,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 13:29:00,138][00476] Signal inference workers to stop experience collection... (2850 times) [2024-03-29 13:29:00,202][00497] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-03-29 13:29:00,301][00476] Signal inference workers to resume experience collection... (2850 times) [2024-03-29 13:29:00,301][00497] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-03-29 13:29:00,305][00497] Updated weights for policy 0, policy_version 11979 (0.0019) [2024-03-29 13:29:03,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 196395008. Throughput: 0: 41913.7. Samples: 78548900. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 13:29:03,840][00126] Avg episode reward: [(0, '0.305')] [2024-03-29 13:29:03,865][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011987_196395008.pth... [2024-03-29 13:29:04,160][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011378_186417152.pth [2024-03-29 13:29:04,706][00497] Updated weights for policy 0, policy_version 11989 (0.0023) [2024-03-29 13:29:07,697][00497] Updated weights for policy 0, policy_version 11999 (0.0027) [2024-03-29 13:29:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 196624384. Throughput: 0: 42240.5. Samples: 78793060. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 13:29:08,840][00126] Avg episode reward: [(0, '0.292')] [2024-03-29 13:29:12,000][00497] Updated weights for policy 0, policy_version 12009 (0.0021) [2024-03-29 13:29:13,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 196837376. Throughput: 0: 42290.6. Samples: 79059780. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:29:13,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:29:15,676][00497] Updated weights for policy 0, policy_version 12019 (0.0024) [2024-03-29 13:29:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 197017600. Throughput: 0: 42152.2. Samples: 79187040. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:29:18,841][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:29:20,248][00497] Updated weights for policy 0, policy_version 12029 (0.0021) [2024-03-29 13:29:23,351][00497] Updated weights for policy 0, policy_version 12039 (0.0031) [2024-03-29 13:29:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 197263360. Throughput: 0: 42286.2. Samples: 79433960. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:29:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:29:27,466][00497] Updated weights for policy 0, policy_version 12049 (0.0018) [2024-03-29 13:29:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 197459968. Throughput: 0: 42217.3. Samples: 79693780. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:29:28,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 13:29:31,154][00497] Updated weights for policy 0, policy_version 12059 (0.0018) [2024-03-29 13:29:31,462][00476] Signal inference workers to stop experience collection... (2900 times) [2024-03-29 13:29:31,466][00476] Signal inference workers to resume experience collection... (2900 times) [2024-03-29 13:29:31,509][00497] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-03-29 13:29:31,513][00497] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-03-29 13:29:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 197640192. Throughput: 0: 42332.6. Samples: 79824060. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:29:33,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 13:29:35,743][00497] Updated weights for policy 0, policy_version 12069 (0.0027) [2024-03-29 13:29:38,760][00497] Updated weights for policy 0, policy_version 12079 (0.0024) [2024-03-29 13:29:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 197902336. Throughput: 0: 42737.3. Samples: 80075540. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:29:38,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:29:43,047][00497] Updated weights for policy 0, policy_version 12089 (0.0019) [2024-03-29 13:29:43,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 198098944. Throughput: 0: 42202.3. Samples: 80326040. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:29:43,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:29:46,888][00497] Updated weights for policy 0, policy_version 12099 (0.0026) [2024-03-29 13:29:48,839][00126] Fps is (10 sec: 39322.0, 60 sec: 42598.6, 300 sec: 41765.3). Total num frames: 198295552. Throughput: 0: 42096.7. Samples: 80443240. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 13:29:48,841][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 13:29:51,405][00497] Updated weights for policy 0, policy_version 12109 (0.0028) [2024-03-29 13:29:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 198524928. Throughput: 0: 42311.6. Samples: 80697080. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:29:53,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:29:54,428][00497] Updated weights for policy 0, policy_version 12119 (0.0025) [2024-03-29 13:29:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 198705152. Throughput: 0: 41761.9. Samples: 80939060. Policy #0 lag: (min: 1.0, avg: 18.3, max: 42.0) [2024-03-29 13:29:58,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:29:58,969][00497] Updated weights for policy 0, policy_version 12129 (0.0040) [2024-03-29 13:30:02,829][00497] Updated weights for policy 0, policy_version 12139 (0.0024) [2024-03-29 13:30:03,839][00126] Fps is (10 sec: 39320.8, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 198918144. Throughput: 0: 41765.6. Samples: 81066500. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:30:03,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:30:05,166][00476] Signal inference workers to stop experience collection... (2950 times) [2024-03-29 13:30:05,200][00497] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-03-29 13:30:05,348][00476] Signal inference workers to resume experience collection... (2950 times) [2024-03-29 13:30:05,348][00497] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-03-29 13:30:07,139][00497] Updated weights for policy 0, policy_version 12149 (0.0018) [2024-03-29 13:30:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 199131136. Throughput: 0: 42181.4. Samples: 81332120. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:30:08,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 13:30:10,231][00497] Updated weights for policy 0, policy_version 12159 (0.0032) [2024-03-29 13:30:13,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 199360512. Throughput: 0: 41768.3. Samples: 81573360. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 13:30:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:30:14,561][00497] Updated weights for policy 0, policy_version 12169 (0.0024) [2024-03-29 13:30:18,434][00497] Updated weights for policy 0, policy_version 12179 (0.0027) [2024-03-29 13:30:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 199557120. Throughput: 0: 41647.1. Samples: 81698180. Policy #0 lag: (min: 1.0, avg: 22.7, max: 42.0) [2024-03-29 13:30:18,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:30:22,389][00497] Updated weights for policy 0, policy_version 12189 (0.0032) [2024-03-29 13:30:23,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 199770112. Throughput: 0: 42179.6. Samples: 81973620. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 13:30:23,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:30:25,603][00497] Updated weights for policy 0, policy_version 12199 (0.0028) [2024-03-29 13:30:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 199983104. Throughput: 0: 41805.3. Samples: 82207280. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 13:30:28,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:30:29,832][00497] Updated weights for policy 0, policy_version 12209 (0.0021) [2024-03-29 13:30:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 200179712. Throughput: 0: 42018.2. Samples: 82334060. Policy #0 lag: (min: 0.0, avg: 21.9, max: 43.0) [2024-03-29 13:30:33,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:30:33,855][00497] Updated weights for policy 0, policy_version 12219 (0.0023) [2024-03-29 13:30:38,034][00497] Updated weights for policy 0, policy_version 12229 (0.0023) [2024-03-29 13:30:38,101][00476] Signal inference workers to stop experience collection... (3000 times) [2024-03-29 13:30:38,130][00497] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-03-29 13:30:38,283][00476] Signal inference workers to resume experience collection... (3000 times) [2024-03-29 13:30:38,283][00497] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-03-29 13:30:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 200409088. Throughput: 0: 42454.2. Samples: 82607520. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 13:30:38,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:30:41,097][00497] Updated weights for policy 0, policy_version 12239 (0.0024) [2024-03-29 13:30:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 200622080. Throughput: 0: 42328.8. Samples: 82843860. Policy #0 lag: (min: 0.0, avg: 19.2, max: 41.0) [2024-03-29 13:30:43,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 13:30:45,533][00497] Updated weights for policy 0, policy_version 12249 (0.0022) [2024-03-29 13:30:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 200818688. Throughput: 0: 42350.3. Samples: 82972260. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:30:48,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:30:49,415][00497] Updated weights for policy 0, policy_version 12259 (0.0020) [2024-03-29 13:30:53,534][00497] Updated weights for policy 0, policy_version 12269 (0.0024) [2024-03-29 13:30:53,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 201031680. Throughput: 0: 42397.7. Samples: 83240020. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:30:53,840][00126] Avg episode reward: [(0, '0.267')] [2024-03-29 13:30:56,644][00497] Updated weights for policy 0, policy_version 12279 (0.0027) [2024-03-29 13:30:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 201244672. Throughput: 0: 42151.6. Samples: 83470180. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 13:30:58,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:31:01,173][00497] Updated weights for policy 0, policy_version 12289 (0.0025) [2024-03-29 13:31:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.5, 300 sec: 41931.9). Total num frames: 201457664. Throughput: 0: 42322.7. Samples: 83602700. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 13:31:03,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 13:31:04,058][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012297_201474048.pth... [2024-03-29 13:31:04,375][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011681_191381504.pth [2024-03-29 13:31:05,100][00497] Updated weights for policy 0, policy_version 12299 (0.0031) [2024-03-29 13:31:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 201637888. Throughput: 0: 41938.2. Samples: 83860840. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 13:31:08,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:31:09,296][00497] Updated weights for policy 0, policy_version 12309 (0.0022) [2024-03-29 13:31:10,964][00476] Signal inference workers to stop experience collection... (3050 times) [2024-03-29 13:31:11,033][00497] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-03-29 13:31:11,039][00476] Signal inference workers to resume experience collection... (3050 times) [2024-03-29 13:31:11,054][00497] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-03-29 13:31:12,579][00497] Updated weights for policy 0, policy_version 12319 (0.0024) [2024-03-29 13:31:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 201883648. Throughput: 0: 41881.3. Samples: 84091940. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 13:31:13,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:31:16,747][00497] Updated weights for policy 0, policy_version 12329 (0.0026) [2024-03-29 13:31:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 202080256. Throughput: 0: 42098.6. Samples: 84228500. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 13:31:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:31:20,650][00497] Updated weights for policy 0, policy_version 12339 (0.0022) [2024-03-29 13:31:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 202276864. Throughput: 0: 41964.9. Samples: 84495940. Policy #0 lag: (min: 1.0, avg: 20.8, max: 41.0) [2024-03-29 13:31:23,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 13:31:24,586][00497] Updated weights for policy 0, policy_version 12349 (0.0018) [2024-03-29 13:31:27,901][00497] Updated weights for policy 0, policy_version 12359 (0.0028) [2024-03-29 13:31:28,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 202522624. Throughput: 0: 41840.8. Samples: 84726700. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 13:31:28,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:31:32,277][00497] Updated weights for policy 0, policy_version 12369 (0.0028) [2024-03-29 13:31:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 202719232. Throughput: 0: 41932.9. Samples: 84859240. Policy #0 lag: (min: 1.0, avg: 22.1, max: 43.0) [2024-03-29 13:31:33,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:31:36,345][00497] Updated weights for policy 0, policy_version 12379 (0.0033) [2024-03-29 13:31:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 202915840. Throughput: 0: 41740.9. Samples: 85118360. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:31:38,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 13:31:40,346][00497] Updated weights for policy 0, policy_version 12389 (0.0030) [2024-03-29 13:31:41,047][00476] Signal inference workers to stop experience collection... (3100 times) [2024-03-29 13:31:41,080][00497] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-03-29 13:31:41,258][00476] Signal inference workers to resume experience collection... (3100 times) [2024-03-29 13:31:41,259][00497] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-03-29 13:31:43,613][00497] Updated weights for policy 0, policy_version 12399 (0.0026) [2024-03-29 13:31:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 203145216. Throughput: 0: 41764.0. Samples: 85349560. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 13:31:43,840][00126] Avg episode reward: [(0, '0.241')] [2024-03-29 13:31:47,912][00497] Updated weights for policy 0, policy_version 12409 (0.0017) [2024-03-29 13:31:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 203358208. Throughput: 0: 41870.7. Samples: 85486880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 13:31:48,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:31:52,124][00497] Updated weights for policy 0, policy_version 12419 (0.0020) [2024-03-29 13:31:53,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 203538432. Throughput: 0: 41912.0. Samples: 85746880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 13:31:53,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 13:31:56,022][00497] Updated weights for policy 0, policy_version 12429 (0.0025) [2024-03-29 13:31:58,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 203784192. Throughput: 0: 41998.7. Samples: 85981880. Policy #0 lag: (min: 1.0, avg: 17.6, max: 41.0) [2024-03-29 13:31:58,841][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:31:59,215][00497] Updated weights for policy 0, policy_version 12439 (0.0021) [2024-03-29 13:32:03,604][00497] Updated weights for policy 0, policy_version 12449 (0.0020) [2024-03-29 13:32:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 203964416. Throughput: 0: 41916.5. Samples: 86114740. Policy #0 lag: (min: 1.0, avg: 17.6, max: 41.0) [2024-03-29 13:32:03,840][00126] Avg episode reward: [(0, '0.272')] [2024-03-29 13:32:07,726][00497] Updated weights for policy 0, policy_version 12459 (0.0032) [2024-03-29 13:32:08,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 204161024. Throughput: 0: 41694.3. Samples: 86372180. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 13:32:08,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:32:11,608][00497] Updated weights for policy 0, policy_version 12469 (0.0023) [2024-03-29 13:32:13,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 204406784. Throughput: 0: 42054.7. Samples: 86619160. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 13:32:13,840][00126] Avg episode reward: [(0, '0.287')] [2024-03-29 13:32:14,888][00497] Updated weights for policy 0, policy_version 12479 (0.0019) [2024-03-29 13:32:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 204603392. Throughput: 0: 41912.5. Samples: 86745300. Policy #0 lag: (min: 1.0, avg: 21.9, max: 41.0) [2024-03-29 13:32:18,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:32:19,062][00497] Updated weights for policy 0, policy_version 12489 (0.0021) [2024-03-29 13:32:19,137][00476] Signal inference workers to stop experience collection... (3150 times) [2024-03-29 13:32:19,189][00497] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-03-29 13:32:19,322][00476] Signal inference workers to resume experience collection... (3150 times) [2024-03-29 13:32:19,322][00497] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-03-29 13:32:23,284][00497] Updated weights for policy 0, policy_version 12499 (0.0026) [2024-03-29 13:32:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 204800000. Throughput: 0: 41914.1. Samples: 87004500. Policy #0 lag: (min: 1.0, avg: 21.9, max: 41.0) [2024-03-29 13:32:23,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:32:26,885][00497] Updated weights for policy 0, policy_version 12509 (0.0028) [2024-03-29 13:32:28,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 205029376. Throughput: 0: 42375.5. Samples: 87256460. Policy #0 lag: (min: 0.0, avg: 20.3, max: 43.0) [2024-03-29 13:32:28,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:32:30,312][00497] Updated weights for policy 0, policy_version 12519 (0.0020) [2024-03-29 13:32:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 205242368. Throughput: 0: 42141.3. Samples: 87383240. Policy #0 lag: (min: 0.0, avg: 20.3, max: 43.0) [2024-03-29 13:32:33,842][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:32:34,321][00497] Updated weights for policy 0, policy_version 12529 (0.0019) [2024-03-29 13:32:38,466][00497] Updated weights for policy 0, policy_version 12539 (0.0018) [2024-03-29 13:32:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 205455360. Throughput: 0: 42333.3. Samples: 87651880. Policy #0 lag: (min: 2.0, avg: 21.0, max: 42.0) [2024-03-29 13:32:38,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:32:42,189][00497] Updated weights for policy 0, policy_version 12549 (0.0024) [2024-03-29 13:32:43,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 205684736. Throughput: 0: 42599.5. Samples: 87898860. Policy #0 lag: (min: 2.0, avg: 21.0, max: 42.0) [2024-03-29 13:32:43,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:32:45,362][00497] Updated weights for policy 0, policy_version 12559 (0.0022) [2024-03-29 13:32:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 205897728. Throughput: 0: 42439.9. Samples: 88024540. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:32:48,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:32:49,685][00497] Updated weights for policy 0, policy_version 12569 (0.0023) [2024-03-29 13:32:53,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 206077952. Throughput: 0: 42676.8. Samples: 88292640. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 13:32:53,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:32:54,009][00497] Updated weights for policy 0, policy_version 12579 (0.0022) [2024-03-29 13:32:56,567][00476] Signal inference workers to stop experience collection... (3200 times) [2024-03-29 13:32:56,625][00497] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-03-29 13:32:56,643][00476] Signal inference workers to resume experience collection... (3200 times) [2024-03-29 13:32:56,656][00497] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-03-29 13:32:57,437][00497] Updated weights for policy 0, policy_version 12589 (0.0025) [2024-03-29 13:32:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 206323712. Throughput: 0: 42883.7. Samples: 88548920. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:32:58,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 13:33:00,620][00497] Updated weights for policy 0, policy_version 12599 (0.0025) [2024-03-29 13:33:03,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42871.3, 300 sec: 42154.1). Total num frames: 206536704. Throughput: 0: 42764.3. Samples: 88669700. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:33:03,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:33:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012607_206553088.pth... [2024-03-29 13:33:04,185][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000011987_196395008.pth [2024-03-29 13:33:04,857][00497] Updated weights for policy 0, policy_version 12609 (0.0018) [2024-03-29 13:33:08,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 206733312. Throughput: 0: 42997.8. Samples: 88939400. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 13:33:08,842][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:33:09,405][00497] Updated weights for policy 0, policy_version 12619 (0.0018) [2024-03-29 13:33:12,911][00497] Updated weights for policy 0, policy_version 12629 (0.0028) [2024-03-29 13:33:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 206946304. Throughput: 0: 42901.9. Samples: 89187040. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 13:33:13,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 13:33:16,072][00497] Updated weights for policy 0, policy_version 12639 (0.0026) [2024-03-29 13:33:18,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42871.5, 300 sec: 42209.6). Total num frames: 207175680. Throughput: 0: 42732.1. Samples: 89306180. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 13:33:18,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:33:20,477][00497] Updated weights for policy 0, policy_version 12649 (0.0022) [2024-03-29 13:33:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42871.6, 300 sec: 42209.6). Total num frames: 207372288. Throughput: 0: 42572.1. Samples: 89567620. Policy #0 lag: (min: 0.0, avg: 21.0, max: 43.0) [2024-03-29 13:33:23,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:33:25,097][00497] Updated weights for policy 0, policy_version 12659 (0.0025) [2024-03-29 13:33:28,483][00497] Updated weights for policy 0, policy_version 12669 (0.0026) [2024-03-29 13:33:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 207585280. Throughput: 0: 42911.6. Samples: 89829880. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 13:33:28,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 13:33:30,461][00476] Signal inference workers to stop experience collection... (3250 times) [2024-03-29 13:33:30,505][00497] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-03-29 13:33:30,542][00476] Signal inference workers to resume experience collection... (3250 times) [2024-03-29 13:33:30,544][00497] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-03-29 13:33:31,635][00497] Updated weights for policy 0, policy_version 12679 (0.0020) [2024-03-29 13:33:33,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42598.3, 300 sec: 42154.1). Total num frames: 207798272. Throughput: 0: 42409.2. Samples: 89932960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 40.0) [2024-03-29 13:33:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:33:36,041][00497] Updated weights for policy 0, policy_version 12689 (0.0022) [2024-03-29 13:33:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 208011264. Throughput: 0: 42302.6. Samples: 90196260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 13:33:38,842][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:33:40,651][00497] Updated weights for policy 0, policy_version 12699 (0.0017) [2024-03-29 13:33:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 208207872. Throughput: 0: 42550.5. Samples: 90463700. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 13:33:43,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:33:44,000][00497] Updated weights for policy 0, policy_version 12709 (0.0026) [2024-03-29 13:33:46,992][00497] Updated weights for policy 0, policy_version 12719 (0.0020) [2024-03-29 13:33:48,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 208453632. Throughput: 0: 42313.9. Samples: 90573820. Policy #0 lag: (min: 1.0, avg: 19.5, max: 42.0) [2024-03-29 13:33:48,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 13:33:51,007][00497] Updated weights for policy 0, policy_version 12729 (0.0021) [2024-03-29 13:33:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 42265.2). Total num frames: 208666624. Throughput: 0: 42388.4. Samples: 90846880. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:33:53,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:33:55,880][00497] Updated weights for policy 0, policy_version 12739 (0.0022) [2024-03-29 13:33:58,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 208863232. Throughput: 0: 42821.0. Samples: 91113980. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:33:58,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:33:59,105][00497] Updated weights for policy 0, policy_version 12749 (0.0027) [2024-03-29 13:34:02,146][00497] Updated weights for policy 0, policy_version 12759 (0.0015) [2024-03-29 13:34:03,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42871.5, 300 sec: 42320.7). Total num frames: 209108992. Throughput: 0: 42613.2. Samples: 91223780. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 13:34:03,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:34:06,328][00497] Updated weights for policy 0, policy_version 12769 (0.0021) [2024-03-29 13:34:07,050][00476] Signal inference workers to stop experience collection... (3300 times) [2024-03-29 13:34:07,051][00476] Signal inference workers to resume experience collection... (3300 times) [2024-03-29 13:34:07,091][00497] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-03-29 13:34:07,091][00497] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-03-29 13:34:08,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 209305600. Throughput: 0: 42872.8. Samples: 91496900. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 13:34:08,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:34:11,527][00497] Updated weights for policy 0, policy_version 12779 (0.0027) [2024-03-29 13:34:13,839][00126] Fps is (10 sec: 36045.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 209469440. Throughput: 0: 42328.9. Samples: 91734680. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 13:34:13,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:34:14,933][00497] Updated weights for policy 0, policy_version 12789 (0.0021) [2024-03-29 13:34:17,860][00497] Updated weights for policy 0, policy_version 12799 (0.0024) [2024-03-29 13:34:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 209731584. Throughput: 0: 42673.9. Samples: 91853280. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 13:34:18,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 13:34:22,187][00497] Updated weights for policy 0, policy_version 12809 (0.0022) [2024-03-29 13:34:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 209928192. Throughput: 0: 42517.9. Samples: 92109560. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:34:23,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:34:27,211][00497] Updated weights for policy 0, policy_version 12819 (0.0019) [2024-03-29 13:34:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 210108416. Throughput: 0: 42433.4. Samples: 92373200. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 13:34:28,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:34:30,453][00497] Updated weights for policy 0, policy_version 12829 (0.0023) [2024-03-29 13:34:33,433][00497] Updated weights for policy 0, policy_version 12839 (0.0029) [2024-03-29 13:34:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 210354176. Throughput: 0: 42429.8. Samples: 92483160. Policy #0 lag: (min: 1.0, avg: 23.0, max: 43.0) [2024-03-29 13:34:33,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 13:34:37,672][00497] Updated weights for policy 0, policy_version 12849 (0.0018) [2024-03-29 13:34:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 210550784. Throughput: 0: 41938.2. Samples: 92734100. Policy #0 lag: (min: 1.0, avg: 23.0, max: 43.0) [2024-03-29 13:34:38,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:34:42,638][00497] Updated weights for policy 0, policy_version 12859 (0.0036) [2024-03-29 13:34:43,655][00476] Signal inference workers to stop experience collection... (3350 times) [2024-03-29 13:34:43,655][00476] Signal inference workers to resume experience collection... (3350 times) [2024-03-29 13:34:43,694][00497] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-03-29 13:34:43,695][00497] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-03-29 13:34:43,839][00126] Fps is (10 sec: 37682.8, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 210731008. Throughput: 0: 42169.6. Samples: 93011620. Policy #0 lag: (min: 0.0, avg: 17.3, max: 41.0) [2024-03-29 13:34:43,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:34:45,791][00497] Updated weights for policy 0, policy_version 12869 (0.0033) [2024-03-29 13:34:48,791][00497] Updated weights for policy 0, policy_version 12879 (0.0019) [2024-03-29 13:34:48,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 211009536. Throughput: 0: 42320.4. Samples: 93128200. Policy #0 lag: (min: 0.0, avg: 17.3, max: 41.0) [2024-03-29 13:34:48,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:34:53,064][00497] Updated weights for policy 0, policy_version 12889 (0.0029) [2024-03-29 13:34:53,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.4, 300 sec: 42320.7). Total num frames: 211189760. Throughput: 0: 41643.2. Samples: 93370840. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:34:53,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:34:58,269][00497] Updated weights for policy 0, policy_version 12899 (0.0027) [2024-03-29 13:34:58,839][00126] Fps is (10 sec: 34406.5, 60 sec: 41506.0, 300 sec: 42154.1). Total num frames: 211353600. Throughput: 0: 42478.2. Samples: 93646200. Policy #0 lag: (min: 0.0, avg: 21.8, max: 42.0) [2024-03-29 13:34:58,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:35:01,511][00497] Updated weights for policy 0, policy_version 12909 (0.0031) [2024-03-29 13:35:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.3, 300 sec: 42320.7). Total num frames: 211615744. Throughput: 0: 42326.7. Samples: 93757980. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:35:03,840][00126] Avg episode reward: [(0, '0.292')] [2024-03-29 13:35:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012917_211632128.pth... [2024-03-29 13:35:04,223][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012297_201474048.pth [2024-03-29 13:35:04,754][00497] Updated weights for policy 0, policy_version 12919 (0.0029) [2024-03-29 13:35:08,593][00497] Updated weights for policy 0, policy_version 12929 (0.0022) [2024-03-29 13:35:08,839][00126] Fps is (10 sec: 47514.0, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 211828736. Throughput: 0: 41934.2. Samples: 93996600. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:35:08,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:35:13,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 211976192. Throughput: 0: 42103.5. Samples: 94267860. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 13:35:13,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:35:13,974][00497] Updated weights for policy 0, policy_version 12939 (0.0023) [2024-03-29 13:35:16,856][00476] Signal inference workers to stop experience collection... (3400 times) [2024-03-29 13:35:16,902][00497] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-03-29 13:35:17,047][00476] Signal inference workers to resume experience collection... (3400 times) [2024-03-29 13:35:17,047][00497] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-03-29 13:35:17,052][00497] Updated weights for policy 0, policy_version 12949 (0.0024) [2024-03-29 13:35:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 212254720. Throughput: 0: 42475.0. Samples: 94394540. Policy #0 lag: (min: 0.0, avg: 17.9, max: 41.0) [2024-03-29 13:35:18,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 13:35:19,938][00497] Updated weights for policy 0, policy_version 12959 (0.0023) [2024-03-29 13:35:23,839][00126] Fps is (10 sec: 49152.4, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 212467712. Throughput: 0: 42223.6. Samples: 94634160. Policy #0 lag: (min: 0.0, avg: 17.9, max: 41.0) [2024-03-29 13:35:23,841][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:35:24,060][00497] Updated weights for policy 0, policy_version 12969 (0.0021) [2024-03-29 13:35:28,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 212631552. Throughput: 0: 42014.6. Samples: 94902280. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 13:35:28,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:35:29,388][00497] Updated weights for policy 0, policy_version 12979 (0.0019) [2024-03-29 13:35:32,518][00497] Updated weights for policy 0, policy_version 12989 (0.0025) [2024-03-29 13:35:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 212877312. Throughput: 0: 42319.7. Samples: 95032580. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 13:35:33,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:35:35,697][00497] Updated weights for policy 0, policy_version 12999 (0.0024) [2024-03-29 13:35:38,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 213090304. Throughput: 0: 41948.0. Samples: 95258500. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 13:35:38,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:35:39,599][00497] Updated weights for policy 0, policy_version 13009 (0.0026) [2024-03-29 13:35:43,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 213270528. Throughput: 0: 41737.3. Samples: 95524380. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 13:35:43,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:35:44,932][00497] Updated weights for policy 0, policy_version 13019 (0.0037) [2024-03-29 13:35:47,868][00476] Signal inference workers to stop experience collection... (3450 times) [2024-03-29 13:35:47,890][00497] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-03-29 13:35:48,071][00476] Signal inference workers to resume experience collection... (3450 times) [2024-03-29 13:35:48,071][00497] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-03-29 13:35:48,075][00497] Updated weights for policy 0, policy_version 13029 (0.0029) [2024-03-29 13:35:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 42209.6). Total num frames: 213483520. Throughput: 0: 42407.5. Samples: 95666320. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 13:35:48,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:35:51,502][00497] Updated weights for policy 0, policy_version 13039 (0.0020) [2024-03-29 13:35:53,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 213729280. Throughput: 0: 41934.2. Samples: 95883640. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 13:35:53,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:35:55,259][00497] Updated weights for policy 0, policy_version 13049 (0.0031) [2024-03-29 13:35:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42871.6, 300 sec: 42265.2). Total num frames: 213925888. Throughput: 0: 41718.8. Samples: 96145200. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 13:35:58,841][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 13:36:00,661][00497] Updated weights for policy 0, policy_version 13059 (0.0028) [2024-03-29 13:36:03,671][00497] Updated weights for policy 0, policy_version 13069 (0.0031) [2024-03-29 13:36:03,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 214122496. Throughput: 0: 42110.7. Samples: 96289520. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 13:36:03,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 13:36:07,136][00497] Updated weights for policy 0, policy_version 13079 (0.0031) [2024-03-29 13:36:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 214351872. Throughput: 0: 41673.8. Samples: 96509480. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:36:08,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 13:36:11,168][00497] Updated weights for policy 0, policy_version 13089 (0.0027) [2024-03-29 13:36:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42871.4, 300 sec: 42265.1). Total num frames: 214548480. Throughput: 0: 41532.9. Samples: 96771260. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:36:13,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:36:16,540][00497] Updated weights for policy 0, policy_version 13099 (0.0023) [2024-03-29 13:36:18,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 42209.6). Total num frames: 214728704. Throughput: 0: 41844.8. Samples: 96915600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 13:36:18,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 13:36:19,804][00497] Updated weights for policy 0, policy_version 13109 (0.0036) [2024-03-29 13:36:20,247][00476] Signal inference workers to stop experience collection... (3500 times) [2024-03-29 13:36:20,325][00497] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-03-29 13:36:20,334][00476] Signal inference workers to resume experience collection... (3500 times) [2024-03-29 13:36:20,352][00497] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-03-29 13:36:22,830][00497] Updated weights for policy 0, policy_version 13119 (0.0030) [2024-03-29 13:36:23,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 214974464. Throughput: 0: 41628.9. Samples: 97131800. Policy #0 lag: (min: 1.0, avg: 20.9, max: 41.0) [2024-03-29 13:36:23,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:36:26,822][00497] Updated weights for policy 0, policy_version 13129 (0.0025) [2024-03-29 13:36:28,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42209.7). Total num frames: 215171072. Throughput: 0: 41677.0. Samples: 97399840. Policy #0 lag: (min: 2.0, avg: 23.1, max: 42.0) [2024-03-29 13:36:28,841][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 13:36:32,267][00497] Updated weights for policy 0, policy_version 13139 (0.0022) [2024-03-29 13:36:33,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 215351296. Throughput: 0: 41464.5. Samples: 97532220. Policy #0 lag: (min: 2.0, avg: 23.1, max: 42.0) [2024-03-29 13:36:33,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:36:35,316][00497] Updated weights for policy 0, policy_version 13149 (0.0030) [2024-03-29 13:36:38,494][00497] Updated weights for policy 0, policy_version 13159 (0.0030) [2024-03-29 13:36:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 215597056. Throughput: 0: 41709.8. Samples: 97760580. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 13:36:38,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:36:42,519][00497] Updated weights for policy 0, policy_version 13169 (0.0021) [2024-03-29 13:36:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 215793664. Throughput: 0: 41467.9. Samples: 98011260. Policy #0 lag: (min: 0.0, avg: 19.2, max: 43.0) [2024-03-29 13:36:43,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:36:47,836][00497] Updated weights for policy 0, policy_version 13179 (0.0028) [2024-03-29 13:36:48,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 215973888. Throughput: 0: 41228.8. Samples: 98144820. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 13:36:48,840][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 13:36:51,038][00497] Updated weights for policy 0, policy_version 13189 (0.0025) [2024-03-29 13:36:52,110][00476] Signal inference workers to stop experience collection... (3550 times) [2024-03-29 13:36:52,127][00497] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-03-29 13:36:52,320][00476] Signal inference workers to resume experience collection... (3550 times) [2024-03-29 13:36:52,320][00497] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-03-29 13:36:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 216219648. Throughput: 0: 42107.6. Samples: 98404320. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 13:36:53,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 13:36:54,554][00497] Updated weights for policy 0, policy_version 13199 (0.0027) [2024-03-29 13:36:58,306][00497] Updated weights for policy 0, policy_version 13209 (0.0019) [2024-03-29 13:36:58,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.1, 300 sec: 42265.2). Total num frames: 216432640. Throughput: 0: 41373.4. Samples: 98633060. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 13:36:58,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:37:03,643][00497] Updated weights for policy 0, policy_version 13219 (0.0022) [2024-03-29 13:37:03,839][00126] Fps is (10 sec: 36044.7, 60 sec: 40960.0, 300 sec: 42098.5). Total num frames: 216580096. Throughput: 0: 41079.6. Samples: 98764180. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 13:37:03,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:37:04,244][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013221_216612864.pth... [2024-03-29 13:37:04,559][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012607_206553088.pth [2024-03-29 13:37:06,894][00497] Updated weights for policy 0, policy_version 13229 (0.0022) [2024-03-29 13:37:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 216842240. Throughput: 0: 41963.9. Samples: 99020180. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 13:37:08,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 13:37:10,379][00497] Updated weights for policy 0, policy_version 13239 (0.0029) [2024-03-29 13:37:13,839][00126] Fps is (10 sec: 47513.7, 60 sec: 41779.3, 300 sec: 42209.6). Total num frames: 217055232. Throughput: 0: 41091.1. Samples: 99248940. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 13:37:13,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:37:14,434][00497] Updated weights for policy 0, policy_version 13249 (0.0025) [2024-03-29 13:37:18,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 217219072. Throughput: 0: 41060.0. Samples: 99379920. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:37:18,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:37:19,517][00497] Updated weights for policy 0, policy_version 13259 (0.0025) [2024-03-29 13:37:22,792][00497] Updated weights for policy 0, policy_version 13269 (0.0028) [2024-03-29 13:37:23,629][00476] Signal inference workers to stop experience collection... (3600 times) [2024-03-29 13:37:23,705][00497] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-03-29 13:37:23,708][00476] Signal inference workers to resume experience collection... (3600 times) [2024-03-29 13:37:23,733][00497] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-03-29 13:37:23,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 42098.6). Total num frames: 217448448. Throughput: 0: 42029.3. Samples: 99651900. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:37:23,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 13:37:25,863][00497] Updated weights for policy 0, policy_version 13279 (0.0020) [2024-03-29 13:37:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 217677824. Throughput: 0: 41634.2. Samples: 99884800. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:37:28,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 13:37:30,003][00497] Updated weights for policy 0, policy_version 13289 (0.0019) [2024-03-29 13:37:33,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 217858048. Throughput: 0: 41584.4. Samples: 100016120. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 13:37:33,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:37:34,916][00497] Updated weights for policy 0, policy_version 13299 (0.0022) [2024-03-29 13:37:38,226][00497] Updated weights for policy 0, policy_version 13309 (0.0018) [2024-03-29 13:37:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 218087424. Throughput: 0: 41803.1. Samples: 100285460. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:37:38,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:37:41,526][00497] Updated weights for policy 0, policy_version 13319 (0.0023) [2024-03-29 13:37:43,839][00126] Fps is (10 sec: 44237.5, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 218300416. Throughput: 0: 41653.8. Samples: 100507480. Policy #0 lag: (min: 0.0, avg: 17.6, max: 41.0) [2024-03-29 13:37:43,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:37:45,658][00497] Updated weights for policy 0, policy_version 13329 (0.0020) [2024-03-29 13:37:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 218497024. Throughput: 0: 41636.0. Samples: 100637800. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:37:48,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:37:50,658][00497] Updated weights for policy 0, policy_version 13339 (0.0022) [2024-03-29 13:37:53,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 218693632. Throughput: 0: 42007.1. Samples: 100910500. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:37:53,840][00126] Avg episode reward: [(0, '0.262')] [2024-03-29 13:37:53,866][00497] Updated weights for policy 0, policy_version 13349 (0.0018) [2024-03-29 13:37:57,307][00497] Updated weights for policy 0, policy_version 13359 (0.0026) [2024-03-29 13:37:58,152][00476] Signal inference workers to stop experience collection... (3650 times) [2024-03-29 13:37:58,173][00497] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-03-29 13:37:58,328][00476] Signal inference workers to resume experience collection... (3650 times) [2024-03-29 13:37:58,328][00497] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-03-29 13:37:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 218939392. Throughput: 0: 42130.7. Samples: 101144820. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 13:37:58,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:38:01,261][00497] Updated weights for policy 0, policy_version 13369 (0.0020) [2024-03-29 13:38:03,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 219136000. Throughput: 0: 41970.5. Samples: 101268600. Policy #0 lag: (min: 2.0, avg: 22.4, max: 42.0) [2024-03-29 13:38:03,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 13:38:06,092][00497] Updated weights for policy 0, policy_version 13379 (0.0022) [2024-03-29 13:38:08,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 219316224. Throughput: 0: 42011.1. Samples: 101542400. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:38:08,841][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 13:38:09,605][00497] Updated weights for policy 0, policy_version 13389 (0.0025) [2024-03-29 13:38:12,821][00497] Updated weights for policy 0, policy_version 13399 (0.0018) [2024-03-29 13:38:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 219561984. Throughput: 0: 41942.2. Samples: 101772200. Policy #0 lag: (min: 0.0, avg: 18.9, max: 41.0) [2024-03-29 13:38:13,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:38:16,859][00497] Updated weights for policy 0, policy_version 13409 (0.0023) [2024-03-29 13:38:18,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 219774976. Throughput: 0: 41690.8. Samples: 101892200. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:38:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:38:21,551][00497] Updated weights for policy 0, policy_version 13419 (0.0018) [2024-03-29 13:38:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 219955200. Throughput: 0: 41913.7. Samples: 102171580. Policy #0 lag: (min: 0.0, avg: 22.3, max: 41.0) [2024-03-29 13:38:23,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:38:25,272][00497] Updated weights for policy 0, policy_version 13429 (0.0018) [2024-03-29 13:38:28,600][00497] Updated weights for policy 0, policy_version 13439 (0.0022) [2024-03-29 13:38:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 220184576. Throughput: 0: 42073.3. Samples: 102400780. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:38:28,840][00126] Avg episode reward: [(0, '0.284')] [2024-03-29 13:38:32,188][00476] Signal inference workers to stop experience collection... (3700 times) [2024-03-29 13:38:32,220][00497] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-03-29 13:38:32,402][00476] Signal inference workers to resume experience collection... (3700 times) [2024-03-29 13:38:32,402][00497] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-03-29 13:38:32,405][00497] Updated weights for policy 0, policy_version 13449 (0.0022) [2024-03-29 13:38:33,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 220397568. Throughput: 0: 41826.2. Samples: 102519980. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:38:33,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 13:38:37,254][00497] Updated weights for policy 0, policy_version 13459 (0.0024) [2024-03-29 13:38:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41932.0). Total num frames: 220577792. Throughput: 0: 41869.0. Samples: 102794600. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 13:38:38,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:38:41,148][00497] Updated weights for policy 0, policy_version 13469 (0.0030) [2024-03-29 13:38:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 220807168. Throughput: 0: 41662.6. Samples: 103019640. Policy #0 lag: (min: 0.0, avg: 19.2, max: 40.0) [2024-03-29 13:38:43,842][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:38:44,377][00497] Updated weights for policy 0, policy_version 13479 (0.0024) [2024-03-29 13:38:48,174][00497] Updated weights for policy 0, policy_version 13489 (0.0024) [2024-03-29 13:38:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 221020160. Throughput: 0: 41669.9. Samples: 103143740. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:38:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:38:52,853][00497] Updated weights for policy 0, policy_version 13499 (0.0027) [2024-03-29 13:38:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 221200384. Throughput: 0: 41584.1. Samples: 103413680. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 13:38:53,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:38:56,548][00497] Updated weights for policy 0, policy_version 13509 (0.0018) [2024-03-29 13:38:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 221446144. Throughput: 0: 41780.5. Samples: 103652320. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 13:38:58,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:38:59,959][00497] Updated weights for policy 0, policy_version 13519 (0.0025) [2024-03-29 13:39:03,387][00497] Updated weights for policy 0, policy_version 13529 (0.0020) [2024-03-29 13:39:03,839][00126] Fps is (10 sec: 47512.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 221675520. Throughput: 0: 42106.5. Samples: 103787000. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 13:39:03,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 13:39:03,862][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013530_221675520.pth... [2024-03-29 13:39:04,161][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000012917_211632128.pth [2024-03-29 13:39:06,961][00476] Signal inference workers to stop experience collection... (3750 times) [2024-03-29 13:39:06,963][00476] Signal inference workers to resume experience collection... (3750 times) [2024-03-29 13:39:07,013][00497] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-03-29 13:39:07,013][00497] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-03-29 13:39:08,351][00497] Updated weights for policy 0, policy_version 13539 (0.0029) [2024-03-29 13:39:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 221839360. Throughput: 0: 41711.2. Samples: 104048580. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:39:08,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 13:39:12,422][00497] Updated weights for policy 0, policy_version 13549 (0.0024) [2024-03-29 13:39:13,839][00126] Fps is (10 sec: 39322.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 222068736. Throughput: 0: 41944.6. Samples: 104288280. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:39:13,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:39:15,756][00497] Updated weights for policy 0, policy_version 13559 (0.0030) [2024-03-29 13:39:18,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 222298112. Throughput: 0: 42096.5. Samples: 104414320. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:39:18,841][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 13:39:19,141][00497] Updated weights for policy 0, policy_version 13569 (0.0019) [2024-03-29 13:39:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 222461952. Throughput: 0: 41688.4. Samples: 104670580. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:39:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:39:24,041][00497] Updated weights for policy 0, policy_version 13579 (0.0030) [2024-03-29 13:39:27,812][00497] Updated weights for policy 0, policy_version 13589 (0.0026) [2024-03-29 13:39:28,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 222691328. Throughput: 0: 42580.4. Samples: 104935760. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 13:39:28,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 13:39:31,155][00497] Updated weights for policy 0, policy_version 13599 (0.0020) [2024-03-29 13:39:33,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 222920704. Throughput: 0: 42364.3. Samples: 105050140. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 13:39:33,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:39:34,630][00497] Updated weights for policy 0, policy_version 13609 (0.0024) [2024-03-29 13:39:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 223117312. Throughput: 0: 42019.1. Samples: 105304540. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 13:39:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:39:39,396][00497] Updated weights for policy 0, policy_version 13619 (0.0021) [2024-03-29 13:39:43,258][00497] Updated weights for policy 0, policy_version 13629 (0.0022) [2024-03-29 13:39:43,582][00476] Signal inference workers to stop experience collection... (3800 times) [2024-03-29 13:39:43,605][00497] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-03-29 13:39:43,804][00476] Signal inference workers to resume experience collection... (3800 times) [2024-03-29 13:39:43,805][00497] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-03-29 13:39:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 223330304. Throughput: 0: 42760.4. Samples: 105576540. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 13:39:43,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 13:39:46,619][00497] Updated weights for policy 0, policy_version 13639 (0.0028) [2024-03-29 13:39:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 223543296. Throughput: 0: 42143.3. Samples: 105683440. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:39:48,840][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 13:39:50,208][00497] Updated weights for policy 0, policy_version 13649 (0.0043) [2024-03-29 13:39:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 223756288. Throughput: 0: 41886.6. Samples: 105933480. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:39:53,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:39:54,815][00497] Updated weights for policy 0, policy_version 13659 (0.0027) [2024-03-29 13:39:58,766][00497] Updated weights for policy 0, policy_version 13669 (0.0024) [2024-03-29 13:39:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 223952896. Throughput: 0: 42664.9. Samples: 106208200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:39:58,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 13:40:02,391][00497] Updated weights for policy 0, policy_version 13679 (0.0020) [2024-03-29 13:40:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 224182272. Throughput: 0: 42245.2. Samples: 106315360. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 13:40:03,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 13:40:05,725][00497] Updated weights for policy 0, policy_version 13689 (0.0027) [2024-03-29 13:40:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 224378880. Throughput: 0: 42045.3. Samples: 106562620. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 13:40:08,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:40:10,311][00497] Updated weights for policy 0, policy_version 13699 (0.0019) [2024-03-29 13:40:13,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 224575488. Throughput: 0: 42294.1. Samples: 106839000. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 13:40:13,841][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 13:40:14,401][00497] Updated weights for policy 0, policy_version 13709 (0.0024) [2024-03-29 13:40:17,950][00497] Updated weights for policy 0, policy_version 13719 (0.0019) [2024-03-29 13:40:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 224804864. Throughput: 0: 42009.1. Samples: 106940540. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 13:40:18,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:40:19,290][00476] Signal inference workers to stop experience collection... (3850 times) [2024-03-29 13:40:19,313][00497] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-03-29 13:40:19,484][00476] Signal inference workers to resume experience collection... (3850 times) [2024-03-29 13:40:19,485][00497] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-03-29 13:40:21,486][00497] Updated weights for policy 0, policy_version 13729 (0.0027) [2024-03-29 13:40:23,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42871.3, 300 sec: 42043.0). Total num frames: 225034240. Throughput: 0: 42040.3. Samples: 107196360. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:40:23,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 13:40:25,997][00497] Updated weights for policy 0, policy_version 13739 (0.0020) [2024-03-29 13:40:28,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 225198080. Throughput: 0: 42085.3. Samples: 107470380. Policy #0 lag: (min: 2.0, avg: 22.5, max: 43.0) [2024-03-29 13:40:28,841][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:40:29,891][00497] Updated weights for policy 0, policy_version 13749 (0.0019) [2024-03-29 13:40:33,474][00497] Updated weights for policy 0, policy_version 13759 (0.0027) [2024-03-29 13:40:33,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 225443840. Throughput: 0: 42204.3. Samples: 107582640. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:40:33,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 13:40:36,743][00497] Updated weights for policy 0, policy_version 13769 (0.0023) [2024-03-29 13:40:38,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 225656832. Throughput: 0: 42214.3. Samples: 107833120. Policy #0 lag: (min: 2.0, avg: 19.3, max: 42.0) [2024-03-29 13:40:38,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:40:41,489][00497] Updated weights for policy 0, policy_version 13779 (0.0029) [2024-03-29 13:40:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 225820672. Throughput: 0: 42031.8. Samples: 108099640. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 13:40:43,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 13:40:45,380][00497] Updated weights for policy 0, policy_version 13789 (0.0028) [2024-03-29 13:40:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 226066432. Throughput: 0: 42239.7. Samples: 108216140. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 13:40:48,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:40:48,983][00497] Updated weights for policy 0, policy_version 13799 (0.0022) [2024-03-29 13:40:52,265][00497] Updated weights for policy 0, policy_version 13809 (0.0027) [2024-03-29 13:40:53,145][00476] Signal inference workers to stop experience collection... (3900 times) [2024-03-29 13:40:53,177][00497] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-03-29 13:40:53,331][00476] Signal inference workers to resume experience collection... (3900 times) [2024-03-29 13:40:53,332][00497] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-03-29 13:40:53,839][00126] Fps is (10 sec: 47514.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 226295808. Throughput: 0: 42216.9. Samples: 108462380. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 13:40:53,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:40:57,073][00497] Updated weights for policy 0, policy_version 13819 (0.0037) [2024-03-29 13:40:58,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 226476032. Throughput: 0: 41968.5. Samples: 108727580. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 13:40:58,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:41:00,815][00497] Updated weights for policy 0, policy_version 13829 (0.0021) [2024-03-29 13:41:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 226705408. Throughput: 0: 42618.2. Samples: 108858360. Policy #0 lag: (min: 1.0, avg: 20.5, max: 44.0) [2024-03-29 13:41:03,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 13:41:03,895][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013838_226721792.pth... [2024-03-29 13:41:04,250][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013221_216612864.pth [2024-03-29 13:41:04,672][00497] Updated weights for policy 0, policy_version 13839 (0.0018) [2024-03-29 13:41:08,123][00497] Updated weights for policy 0, policy_version 13849 (0.0025) [2024-03-29 13:41:08,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 226902016. Throughput: 0: 41950.0. Samples: 109084100. Policy #0 lag: (min: 1.0, avg: 20.5, max: 44.0) [2024-03-29 13:41:08,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 13:41:12,937][00497] Updated weights for policy 0, policy_version 13859 (0.0027) [2024-03-29 13:41:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 227115008. Throughput: 0: 41877.8. Samples: 109354880. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:41:13,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:41:16,749][00497] Updated weights for policy 0, policy_version 13869 (0.0022) [2024-03-29 13:41:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 227344384. Throughput: 0: 42223.2. Samples: 109482680. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 13:41:18,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 13:41:20,436][00497] Updated weights for policy 0, policy_version 13879 (0.0027) [2024-03-29 13:41:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 227540992. Throughput: 0: 41704.9. Samples: 109709840. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 13:41:23,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 13:41:23,885][00497] Updated weights for policy 0, policy_version 13889 (0.0021) [2024-03-29 13:41:25,833][00476] Signal inference workers to stop experience collection... (3950 times) [2024-03-29 13:41:25,873][00497] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-03-29 13:41:26,016][00476] Signal inference workers to resume experience collection... (3950 times) [2024-03-29 13:41:26,016][00497] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-03-29 13:41:28,342][00497] Updated weights for policy 0, policy_version 13899 (0.0027) [2024-03-29 13:41:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 227737600. Throughput: 0: 41893.0. Samples: 109984820. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 13:41:28,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:41:32,191][00497] Updated weights for policy 0, policy_version 13909 (0.0027) [2024-03-29 13:41:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 227950592. Throughput: 0: 42271.5. Samples: 110118360. Policy #0 lag: (min: 1.0, avg: 22.0, max: 41.0) [2024-03-29 13:41:33,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:41:35,910][00497] Updated weights for policy 0, policy_version 13919 (0.0033) [2024-03-29 13:41:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 228179968. Throughput: 0: 41847.6. Samples: 110345520. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:41:38,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:41:39,302][00497] Updated weights for policy 0, policy_version 13929 (0.0031) [2024-03-29 13:41:43,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 228360192. Throughput: 0: 41881.4. Samples: 110612240. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 13:41:43,840][00126] Avg episode reward: [(0, '0.281')] [2024-03-29 13:41:43,930][00497] Updated weights for policy 0, policy_version 13939 (0.0018) [2024-03-29 13:41:47,643][00497] Updated weights for policy 0, policy_version 13949 (0.0022) [2024-03-29 13:41:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 228605952. Throughput: 0: 42060.9. Samples: 110751100. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 13:41:48,840][00126] Avg episode reward: [(0, '0.321')] [2024-03-29 13:41:51,286][00497] Updated weights for policy 0, policy_version 13959 (0.0019) [2024-03-29 13:41:53,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 228818944. Throughput: 0: 42373.2. Samples: 110990900. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 13:41:53,840][00126] Avg episode reward: [(0, '0.227')] [2024-03-29 13:41:54,847][00497] Updated weights for policy 0, policy_version 13969 (0.0034) [2024-03-29 13:41:58,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 229015552. Throughput: 0: 41910.7. Samples: 111240860. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:41:58,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:41:59,650][00497] Updated weights for policy 0, policy_version 13979 (0.0026) [2024-03-29 13:42:01,356][00476] Signal inference workers to stop experience collection... (4000 times) [2024-03-29 13:42:01,357][00476] Signal inference workers to resume experience collection... (4000 times) [2024-03-29 13:42:01,398][00497] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-03-29 13:42:01,399][00497] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-03-29 13:42:03,333][00497] Updated weights for policy 0, policy_version 13989 (0.0024) [2024-03-29 13:42:03,839][00126] Fps is (10 sec: 39322.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 229212160. Throughput: 0: 42130.7. Samples: 111378560. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 13:42:03,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:42:06,770][00497] Updated weights for policy 0, policy_version 13999 (0.0028) [2024-03-29 13:42:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 229457920. Throughput: 0: 42505.8. Samples: 111622600. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:42:08,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 13:42:10,606][00497] Updated weights for policy 0, policy_version 14009 (0.0023) [2024-03-29 13:42:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 229638144. Throughput: 0: 41931.1. Samples: 111871720. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:42:13,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:42:15,213][00497] Updated weights for policy 0, policy_version 14019 (0.0018) [2024-03-29 13:42:18,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 229834752. Throughput: 0: 41897.6. Samples: 112003760. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 13:42:18,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 13:42:18,921][00497] Updated weights for policy 0, policy_version 14029 (0.0018) [2024-03-29 13:42:22,479][00497] Updated weights for policy 0, policy_version 14039 (0.0026) [2024-03-29 13:42:23,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 230096896. Throughput: 0: 42436.0. Samples: 112255140. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 13:42:23,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:42:26,102][00497] Updated weights for policy 0, policy_version 14049 (0.0028) [2024-03-29 13:42:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 230277120. Throughput: 0: 41995.8. Samples: 112502060. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 13:42:28,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 13:42:30,754][00497] Updated weights for policy 0, policy_version 14059 (0.0028) [2024-03-29 13:42:33,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 230457344. Throughput: 0: 41931.6. Samples: 112638020. Policy #0 lag: (min: 1.0, avg: 20.3, max: 41.0) [2024-03-29 13:42:33,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:42:34,324][00476] Signal inference workers to stop experience collection... (4050 times) [2024-03-29 13:42:34,355][00497] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-03-29 13:42:34,522][00476] Signal inference workers to resume experience collection... (4050 times) [2024-03-29 13:42:34,522][00497] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-03-29 13:42:34,529][00497] Updated weights for policy 0, policy_version 14069 (0.0024) [2024-03-29 13:42:38,165][00497] Updated weights for policy 0, policy_version 14079 (0.0028) [2024-03-29 13:42:38,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 230703104. Throughput: 0: 42006.7. Samples: 112881200. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:42:38,841][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:42:41,977][00497] Updated weights for policy 0, policy_version 14089 (0.0021) [2024-03-29 13:42:43,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 230899712. Throughput: 0: 42032.4. Samples: 113132320. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:42:43,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:42:46,474][00497] Updated weights for policy 0, policy_version 14099 (0.0018) [2024-03-29 13:42:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 231096320. Throughput: 0: 41915.9. Samples: 113264780. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:42:48,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 13:42:50,072][00497] Updated weights for policy 0, policy_version 14109 (0.0029) [2024-03-29 13:42:53,692][00497] Updated weights for policy 0, policy_version 14119 (0.0020) [2024-03-29 13:42:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 231325696. Throughput: 0: 41934.6. Samples: 113509660. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 13:42:53,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 13:42:57,665][00497] Updated weights for policy 0, policy_version 14129 (0.0023) [2024-03-29 13:42:58,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 231522304. Throughput: 0: 41977.8. Samples: 113760720. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 13:42:58,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 13:43:02,123][00497] Updated weights for policy 0, policy_version 14139 (0.0026) [2024-03-29 13:43:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 42098.6). Total num frames: 231735296. Throughput: 0: 42031.2. Samples: 113895160. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 13:43:03,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:43:04,069][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014145_231751680.pth... [2024-03-29 13:43:04,380][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013530_221675520.pth [2024-03-29 13:43:05,611][00497] Updated weights for policy 0, policy_version 14149 (0.0022) [2024-03-29 13:43:07,360][00476] Signal inference workers to stop experience collection... (4100 times) [2024-03-29 13:43:07,430][00497] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-03-29 13:43:07,527][00476] Signal inference workers to resume experience collection... (4100 times) [2024-03-29 13:43:07,527][00497] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-03-29 13:43:08,839][00126] Fps is (10 sec: 42597.6, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 231948288. Throughput: 0: 41992.8. Samples: 114144820. Policy #0 lag: (min: 1.0, avg: 21.8, max: 42.0) [2024-03-29 13:43:08,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 13:43:09,230][00497] Updated weights for policy 0, policy_version 14159 (0.0026) [2024-03-29 13:43:13,144][00497] Updated weights for policy 0, policy_version 14169 (0.0023) [2024-03-29 13:43:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.1, 300 sec: 41987.4). Total num frames: 232161280. Throughput: 0: 41918.2. Samples: 114388380. Policy #0 lag: (min: 3.0, avg: 22.8, max: 43.0) [2024-03-29 13:43:13,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 13:43:17,694][00497] Updated weights for policy 0, policy_version 14179 (0.0023) [2024-03-29 13:43:18,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 232357888. Throughput: 0: 41629.7. Samples: 114511360. Policy #0 lag: (min: 3.0, avg: 22.8, max: 43.0) [2024-03-29 13:43:18,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:43:21,378][00497] Updated weights for policy 0, policy_version 14189 (0.0031) [2024-03-29 13:43:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 232587264. Throughput: 0: 42148.5. Samples: 114777880. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:43:23,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:43:24,953][00497] Updated weights for policy 0, policy_version 14199 (0.0027) [2024-03-29 13:43:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 232783872. Throughput: 0: 41658.3. Samples: 115006940. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:43:28,840][00126] Avg episode reward: [(0, '0.264')] [2024-03-29 13:43:28,847][00497] Updated weights for policy 0, policy_version 14209 (0.0028) [2024-03-29 13:43:33,490][00497] Updated weights for policy 0, policy_version 14219 (0.0021) [2024-03-29 13:43:33,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 232980480. Throughput: 0: 41822.7. Samples: 115146800. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 13:43:33,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 13:43:36,950][00497] Updated weights for policy 0, policy_version 14229 (0.0028) [2024-03-29 13:43:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 233209856. Throughput: 0: 42048.8. Samples: 115401860. Policy #0 lag: (min: 1.0, avg: 21.2, max: 41.0) [2024-03-29 13:43:38,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 13:43:40,653][00476] Signal inference workers to stop experience collection... (4150 times) [2024-03-29 13:43:40,655][00476] Signal inference workers to resume experience collection... (4150 times) [2024-03-29 13:43:40,697][00497] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-03-29 13:43:40,698][00497] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-03-29 13:43:40,919][00497] Updated weights for policy 0, policy_version 14239 (0.0023) [2024-03-29 13:43:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 233422848. Throughput: 0: 41224.9. Samples: 115615840. Policy #0 lag: (min: 2.0, avg: 24.0, max: 42.0) [2024-03-29 13:43:43,840][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 13:43:44,977][00497] Updated weights for policy 0, policy_version 14249 (0.0022) [2024-03-29 13:43:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 233603072. Throughput: 0: 41479.5. Samples: 115761740. Policy #0 lag: (min: 2.0, avg: 24.0, max: 42.0) [2024-03-29 13:43:48,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 13:43:49,648][00497] Updated weights for policy 0, policy_version 14259 (0.0023) [2024-03-29 13:43:53,124][00497] Updated weights for policy 0, policy_version 14269 (0.0024) [2024-03-29 13:43:53,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 233799680. Throughput: 0: 41583.7. Samples: 116016080. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:43:53,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:43:56,857][00497] Updated weights for policy 0, policy_version 14279 (0.0020) [2024-03-29 13:43:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 234045440. Throughput: 0: 40926.7. Samples: 116230080. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:43:58,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 13:44:01,164][00497] Updated weights for policy 0, policy_version 14289 (0.0017) [2024-03-29 13:44:03,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 234225664. Throughput: 0: 41300.5. Samples: 116369880. Policy #0 lag: (min: 2.0, avg: 19.6, max: 42.0) [2024-03-29 13:44:03,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:44:05,414][00497] Updated weights for policy 0, policy_version 14299 (0.0031) [2024-03-29 13:44:08,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 234422272. Throughput: 0: 41288.5. Samples: 116635860. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:44:08,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:44:09,044][00497] Updated weights for policy 0, policy_version 14309 (0.0018) [2024-03-29 13:44:12,770][00497] Updated weights for policy 0, policy_version 14319 (0.0032) [2024-03-29 13:44:13,107][00476] Signal inference workers to stop experience collection... (4200 times) [2024-03-29 13:44:13,108][00476] Signal inference workers to resume experience collection... (4200 times) [2024-03-29 13:44:13,150][00497] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-03-29 13:44:13,150][00497] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-03-29 13:44:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 41876.4). Total num frames: 234651648. Throughput: 0: 41394.6. Samples: 116869700. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 13:44:13,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:44:16,845][00497] Updated weights for policy 0, policy_version 14329 (0.0020) [2024-03-29 13:44:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 234848256. Throughput: 0: 40854.7. Samples: 116985260. Policy #0 lag: (min: 0.0, avg: 24.7, max: 44.0) [2024-03-29 13:44:18,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:44:21,042][00497] Updated weights for policy 0, policy_version 14339 (0.0026) [2024-03-29 13:44:23,839][00126] Fps is (10 sec: 37683.2, 60 sec: 40687.0, 300 sec: 41820.9). Total num frames: 235028480. Throughput: 0: 41174.3. Samples: 117254700. Policy #0 lag: (min: 0.0, avg: 24.7, max: 44.0) [2024-03-29 13:44:23,841][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:44:24,891][00497] Updated weights for policy 0, policy_version 14349 (0.0026) [2024-03-29 13:44:28,447][00497] Updated weights for policy 0, policy_version 14359 (0.0023) [2024-03-29 13:44:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 235274240. Throughput: 0: 41770.2. Samples: 117495500. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:44:28,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:44:32,526][00497] Updated weights for policy 0, policy_version 14369 (0.0034) [2024-03-29 13:44:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 235454464. Throughput: 0: 40855.2. Samples: 117600220. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:44:33,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:44:36,938][00497] Updated weights for policy 0, policy_version 14379 (0.0025) [2024-03-29 13:44:38,839][00126] Fps is (10 sec: 37683.3, 60 sec: 40687.0, 300 sec: 41765.3). Total num frames: 235651072. Throughput: 0: 41281.8. Samples: 117873760. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:44:38,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:44:40,821][00497] Updated weights for policy 0, policy_version 14389 (0.0020) [2024-03-29 13:44:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 40960.0, 300 sec: 41820.9). Total num frames: 235880448. Throughput: 0: 41750.8. Samples: 118108860. Policy #0 lag: (min: 0.0, avg: 20.0, max: 41.0) [2024-03-29 13:44:43,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:44:44,199][00497] Updated weights for policy 0, policy_version 14399 (0.0025) [2024-03-29 13:44:48,529][00497] Updated weights for policy 0, policy_version 14409 (0.0019) [2024-03-29 13:44:48,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 236077056. Throughput: 0: 41160.0. Samples: 118222080. Policy #0 lag: (min: 0.0, avg: 21.1, max: 40.0) [2024-03-29 13:44:48,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:44:50,928][00476] Signal inference workers to stop experience collection... (4250 times) [2024-03-29 13:44:50,968][00497] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-03-29 13:44:51,124][00476] Signal inference workers to resume experience collection... (4250 times) [2024-03-29 13:44:51,124][00497] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-03-29 13:44:52,749][00497] Updated weights for policy 0, policy_version 14419 (0.0030) [2024-03-29 13:44:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 236273664. Throughput: 0: 41156.9. Samples: 118487920. Policy #0 lag: (min: 0.0, avg: 21.1, max: 40.0) [2024-03-29 13:44:53,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:44:56,586][00497] Updated weights for policy 0, policy_version 14429 (0.0027) [2024-03-29 13:44:58,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 236519424. Throughput: 0: 41552.8. Samples: 118739580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:44:58,842][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 13:44:59,748][00497] Updated weights for policy 0, policy_version 14439 (0.0020) [2024-03-29 13:45:03,839][00126] Fps is (10 sec: 44235.8, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 236716032. Throughput: 0: 41741.1. Samples: 118863620. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:45:03,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 13:45:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014448_236716032.pth... [2024-03-29 13:45:04,166][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000013838_226721792.pth [2024-03-29 13:45:04,440][00497] Updated weights for policy 0, policy_version 14449 (0.0019) [2024-03-29 13:45:08,068][00497] Updated weights for policy 0, policy_version 14459 (0.0020) [2024-03-29 13:45:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 236912640. Throughput: 0: 41434.2. Samples: 119119240. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 13:45:08,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:45:12,085][00497] Updated weights for policy 0, policy_version 14469 (0.0023) [2024-03-29 13:45:13,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 237142016. Throughput: 0: 41845.3. Samples: 119378540. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:45:13,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 13:45:15,395][00497] Updated weights for policy 0, policy_version 14479 (0.0025) [2024-03-29 13:45:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 237355008. Throughput: 0: 42059.0. Samples: 119492880. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:45:18,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:45:19,677][00497] Updated weights for policy 0, policy_version 14489 (0.0022) [2024-03-29 13:45:23,764][00497] Updated weights for policy 0, policy_version 14499 (0.0023) [2024-03-29 13:45:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 237551616. Throughput: 0: 41651.1. Samples: 119748060. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:45:23,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 13:45:27,881][00497] Updated weights for policy 0, policy_version 14509 (0.0028) [2024-03-29 13:45:28,429][00476] Signal inference workers to stop experience collection... (4300 times) [2024-03-29 13:45:28,434][00476] Signal inference workers to resume experience collection... (4300 times) [2024-03-29 13:45:28,481][00497] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-03-29 13:45:28,482][00497] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-03-29 13:45:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 237764608. Throughput: 0: 42400.4. Samples: 120016880. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:45:28,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:45:31,194][00497] Updated weights for policy 0, policy_version 14519 (0.0027) [2024-03-29 13:45:33,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 41876.4). Total num frames: 238010368. Throughput: 0: 42299.4. Samples: 120125560. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:45:33,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 13:45:35,264][00497] Updated weights for policy 0, policy_version 14529 (0.0028) [2024-03-29 13:45:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41932.0). Total num frames: 238190592. Throughput: 0: 42145.3. Samples: 120384460. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 13:45:38,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:45:39,445][00497] Updated weights for policy 0, policy_version 14539 (0.0022) [2024-03-29 13:45:43,611][00497] Updated weights for policy 0, policy_version 14549 (0.0023) [2024-03-29 13:45:43,839][00126] Fps is (10 sec: 36045.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 238370816. Throughput: 0: 42106.7. Samples: 120634380. Policy #0 lag: (min: 0.0, avg: 20.2, max: 43.0) [2024-03-29 13:45:43,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:45:47,061][00497] Updated weights for policy 0, policy_version 14559 (0.0025) [2024-03-29 13:45:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.3, 300 sec: 41820.8). Total num frames: 238632960. Throughput: 0: 41592.5. Samples: 120735280. Policy #0 lag: (min: 0.0, avg: 20.2, max: 43.0) [2024-03-29 13:45:48,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 13:45:51,264][00497] Updated weights for policy 0, policy_version 14569 (0.0023) [2024-03-29 13:45:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 238796800. Throughput: 0: 41782.1. Samples: 120999440. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 13:45:53,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:45:55,458][00497] Updated weights for policy 0, policy_version 14579 (0.0022) [2024-03-29 13:45:58,839][00126] Fps is (10 sec: 34406.7, 60 sec: 40960.0, 300 sec: 41598.7). Total num frames: 238977024. Throughput: 0: 41460.9. Samples: 121244280. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 13:45:58,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 13:45:59,579][00497] Updated weights for policy 0, policy_version 14589 (0.0026) [2024-03-29 13:45:59,615][00476] Signal inference workers to stop experience collection... (4350 times) [2024-03-29 13:45:59,633][00497] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-03-29 13:45:59,825][00476] Signal inference workers to resume experience collection... (4350 times) [2024-03-29 13:45:59,826][00497] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-03-29 13:46:02,780][00497] Updated weights for policy 0, policy_version 14599 (0.0024) [2024-03-29 13:46:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 239239168. Throughput: 0: 41591.6. Samples: 121364500. Policy #0 lag: (min: 2.0, avg: 20.6, max: 43.0) [2024-03-29 13:46:03,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:46:07,134][00497] Updated weights for policy 0, policy_version 14609 (0.0021) [2024-03-29 13:46:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 239419392. Throughput: 0: 41646.2. Samples: 121622140. Policy #0 lag: (min: 1.0, avg: 23.6, max: 43.0) [2024-03-29 13:46:08,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:46:10,994][00497] Updated weights for policy 0, policy_version 14619 (0.0032) [2024-03-29 13:46:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 239616000. Throughput: 0: 41352.9. Samples: 121877760. Policy #0 lag: (min: 1.0, avg: 23.6, max: 43.0) [2024-03-29 13:46:13,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:46:15,545][00497] Updated weights for policy 0, policy_version 14629 (0.0029) [2024-03-29 13:46:18,714][00497] Updated weights for policy 0, policy_version 14639 (0.0021) [2024-03-29 13:46:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 239845376. Throughput: 0: 41320.6. Samples: 121984980. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 13:46:18,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 13:46:22,898][00497] Updated weights for policy 0, policy_version 14649 (0.0025) [2024-03-29 13:46:23,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 240025600. Throughput: 0: 41013.3. Samples: 122230060. Policy #0 lag: (min: 0.0, avg: 18.3, max: 41.0) [2024-03-29 13:46:23,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 13:46:26,815][00497] Updated weights for policy 0, policy_version 14659 (0.0024) [2024-03-29 13:46:28,624][00476] Signal inference workers to stop experience collection... (4400 times) [2024-03-29 13:46:28,686][00497] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-03-29 13:46:28,787][00476] Signal inference workers to resume experience collection... (4400 times) [2024-03-29 13:46:28,788][00497] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-03-29 13:46:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 240238592. Throughput: 0: 41155.5. Samples: 122486380. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:46:28,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:46:31,253][00497] Updated weights for policy 0, policy_version 14669 (0.0023) [2024-03-29 13:46:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 240467968. Throughput: 0: 41924.4. Samples: 122621880. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:46:33,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 13:46:34,368][00497] Updated weights for policy 0, policy_version 14679 (0.0023) [2024-03-29 13:46:38,685][00497] Updated weights for policy 0, policy_version 14689 (0.0019) [2024-03-29 13:46:38,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 240664576. Throughput: 0: 40900.2. Samples: 122839940. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:46:38,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:46:42,813][00497] Updated weights for policy 0, policy_version 14699 (0.0019) [2024-03-29 13:46:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 240877568. Throughput: 0: 41404.0. Samples: 123107460. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:46:43,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:46:46,976][00497] Updated weights for policy 0, policy_version 14709 (0.0019) [2024-03-29 13:46:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 40687.0, 300 sec: 41543.2). Total num frames: 241074176. Throughput: 0: 41686.2. Samples: 123240380. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:46:48,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:46:50,217][00497] Updated weights for policy 0, policy_version 14719 (0.0036) [2024-03-29 13:46:53,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 241287168. Throughput: 0: 40847.4. Samples: 123460280. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 13:46:53,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 13:46:54,645][00497] Updated weights for policy 0, policy_version 14729 (0.0022) [2024-03-29 13:46:57,953][00476] Signal inference workers to stop experience collection... (4450 times) [2024-03-29 13:46:58,021][00497] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-03-29 13:46:58,025][00476] Signal inference workers to resume experience collection... (4450 times) [2024-03-29 13:46:58,048][00497] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-03-29 13:46:58,331][00497] Updated weights for policy 0, policy_version 14739 (0.0035) [2024-03-29 13:46:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 241483776. Throughput: 0: 41215.6. Samples: 123732460. Policy #0 lag: (min: 2.0, avg: 21.1, max: 43.0) [2024-03-29 13:46:58,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:47:02,416][00497] Updated weights for policy 0, policy_version 14749 (0.0028) [2024-03-29 13:47:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40959.9, 300 sec: 41487.6). Total num frames: 241696768. Throughput: 0: 41735.9. Samples: 123863100. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:47:03,840][00126] Avg episode reward: [(0, '0.524')] [2024-03-29 13:47:03,927][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014753_241713152.pth... [2024-03-29 13:47:04,240][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014145_231751680.pth [2024-03-29 13:47:04,258][00476] Saving new best policy, reward=0.524! [2024-03-29 13:47:06,196][00497] Updated weights for policy 0, policy_version 14759 (0.0026) [2024-03-29 13:47:08,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 241942528. Throughput: 0: 41172.7. Samples: 124082840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 40.0) [2024-03-29 13:47:08,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:47:10,594][00497] Updated weights for policy 0, policy_version 14769 (0.0029) [2024-03-29 13:47:13,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 242106368. Throughput: 0: 41241.0. Samples: 124342220. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 13:47:13,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 13:47:14,367][00497] Updated weights for policy 0, policy_version 14779 (0.0029) [2024-03-29 13:47:18,457][00497] Updated weights for policy 0, policy_version 14789 (0.0024) [2024-03-29 13:47:18,839][00126] Fps is (10 sec: 37683.8, 60 sec: 41233.1, 300 sec: 41432.1). Total num frames: 242319360. Throughput: 0: 41283.2. Samples: 124479620. Policy #0 lag: (min: 0.0, avg: 20.1, max: 42.0) [2024-03-29 13:47:18,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 13:47:21,847][00497] Updated weights for policy 0, policy_version 14799 (0.0032) [2024-03-29 13:47:23,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 242565120. Throughput: 0: 41627.4. Samples: 124713180. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 13:47:23,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:47:26,249][00497] Updated weights for policy 0, policy_version 14809 (0.0027) [2024-03-29 13:47:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 242745344. Throughput: 0: 41628.9. Samples: 124980760. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 13:47:28,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 13:47:30,054][00497] Updated weights for policy 0, policy_version 14819 (0.0025) [2024-03-29 13:47:30,061][00476] Signal inference workers to stop experience collection... (4500 times) [2024-03-29 13:47:30,062][00476] Signal inference workers to resume experience collection... (4500 times) [2024-03-29 13:47:30,097][00497] InferenceWorker_p0-w0: stopping experience collection (4500 times) [2024-03-29 13:47:30,098][00497] InferenceWorker_p0-w0: resuming experience collection (4500 times) [2024-03-29 13:47:33,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 242941952. Throughput: 0: 41252.0. Samples: 125096720. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:47:33,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 13:47:34,128][00497] Updated weights for policy 0, policy_version 14829 (0.0029) [2024-03-29 13:47:37,370][00497] Updated weights for policy 0, policy_version 14839 (0.0025) [2024-03-29 13:47:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41654.3). Total num frames: 243187712. Throughput: 0: 42089.0. Samples: 125354280. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:47:38,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:47:41,842][00497] Updated weights for policy 0, policy_version 14849 (0.0020) [2024-03-29 13:47:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 243367936. Throughput: 0: 41813.8. Samples: 125614080. Policy #0 lag: (min: 0.0, avg: 19.3, max: 41.0) [2024-03-29 13:47:43,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:47:45,635][00497] Updated weights for policy 0, policy_version 14859 (0.0026) [2024-03-29 13:47:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 243580928. Throughput: 0: 41537.0. Samples: 125732260. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:47:48,841][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 13:47:49,608][00497] Updated weights for policy 0, policy_version 14869 (0.0022) [2024-03-29 13:47:52,880][00497] Updated weights for policy 0, policy_version 14879 (0.0034) [2024-03-29 13:47:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 243826688. Throughput: 0: 42354.4. Samples: 125988780. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:47:53,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 13:47:57,379][00497] Updated weights for policy 0, policy_version 14889 (0.0030) [2024-03-29 13:47:58,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 243990528. Throughput: 0: 42220.4. Samples: 126242140. Policy #0 lag: (min: 1.0, avg: 24.4, max: 43.0) [2024-03-29 13:47:58,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:48:01,201][00497] Updated weights for policy 0, policy_version 14899 (0.0018) [2024-03-29 13:48:03,678][00476] Signal inference workers to stop experience collection... (4550 times) [2024-03-29 13:48:03,717][00497] InferenceWorker_p0-w0: stopping experience collection (4550 times) [2024-03-29 13:48:03,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 244203520. Throughput: 0: 42017.7. Samples: 126370420. Policy #0 lag: (min: 1.0, avg: 24.4, max: 43.0) [2024-03-29 13:48:03,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 13:48:03,890][00476] Signal inference workers to resume experience collection... (4550 times) [2024-03-29 13:48:03,890][00497] InferenceWorker_p0-w0: resuming experience collection (4550 times) [2024-03-29 13:48:05,081][00497] Updated weights for policy 0, policy_version 14909 (0.0018) [2024-03-29 13:48:08,435][00497] Updated weights for policy 0, policy_version 14919 (0.0030) [2024-03-29 13:48:08,839][00126] Fps is (10 sec: 45875.5, 60 sec: 41779.3, 300 sec: 41654.3). Total num frames: 244449280. Throughput: 0: 42589.9. Samples: 126629720. Policy #0 lag: (min: 0.0, avg: 19.3, max: 40.0) [2024-03-29 13:48:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:48:12,971][00497] Updated weights for policy 0, policy_version 14929 (0.0027) [2024-03-29 13:48:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41598.7). Total num frames: 244629504. Throughput: 0: 42009.6. Samples: 126871200. Policy #0 lag: (min: 0.0, avg: 19.3, max: 40.0) [2024-03-29 13:48:13,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 13:48:16,854][00497] Updated weights for policy 0, policy_version 14939 (0.0022) [2024-03-29 13:48:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41543.2). Total num frames: 244842496. Throughput: 0: 42062.2. Samples: 126989520. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 13:48:18,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 13:48:20,748][00497] Updated weights for policy 0, policy_version 14949 (0.0019) [2024-03-29 13:48:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 245071872. Throughput: 0: 42221.3. Samples: 127254240. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 13:48:23,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 13:48:24,094][00497] Updated weights for policy 0, policy_version 14959 (0.0039) [2024-03-29 13:48:28,403][00497] Updated weights for policy 0, policy_version 14969 (0.0025) [2024-03-29 13:48:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 245268480. Throughput: 0: 42017.8. Samples: 127504880. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 13:48:28,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:48:32,143][00497] Updated weights for policy 0, policy_version 14979 (0.0023) [2024-03-29 13:48:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 41598.7). Total num frames: 245481472. Throughput: 0: 42031.1. Samples: 127623660. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:48:33,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 13:48:36,457][00497] Updated weights for policy 0, policy_version 14989 (0.0021) [2024-03-29 13:48:38,093][00476] Signal inference workers to stop experience collection... (4600 times) [2024-03-29 13:48:38,172][00497] InferenceWorker_p0-w0: stopping experience collection (4600 times) [2024-03-29 13:48:38,259][00476] Signal inference workers to resume experience collection... (4600 times) [2024-03-29 13:48:38,259][00497] InferenceWorker_p0-w0: resuming experience collection (4600 times) [2024-03-29 13:48:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 245694464. Throughput: 0: 42260.9. Samples: 127890520. Policy #0 lag: (min: 0.0, avg: 20.6, max: 43.0) [2024-03-29 13:48:38,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:48:39,693][00497] Updated weights for policy 0, policy_version 14999 (0.0034) [2024-03-29 13:48:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 245891072. Throughput: 0: 41896.9. Samples: 128127500. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:48:43,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:48:43,959][00497] Updated weights for policy 0, policy_version 15009 (0.0023) [2024-03-29 13:48:47,813][00497] Updated weights for policy 0, policy_version 15019 (0.0022) [2024-03-29 13:48:48,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.2, 300 sec: 41765.3). Total num frames: 246120448. Throughput: 0: 42117.3. Samples: 128265700. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 13:48:48,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:48:52,074][00497] Updated weights for policy 0, policy_version 15029 (0.0022) [2024-03-29 13:48:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 246300672. Throughput: 0: 42047.1. Samples: 128521840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 13:48:53,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:48:55,262][00497] Updated weights for policy 0, policy_version 15039 (0.0022) [2024-03-29 13:48:58,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 246530048. Throughput: 0: 41717.8. Samples: 128748500. Policy #0 lag: (min: 0.0, avg: 19.4, max: 41.0) [2024-03-29 13:48:58,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:48:59,684][00497] Updated weights for policy 0, policy_version 15049 (0.0028) [2024-03-29 13:49:03,425][00497] Updated weights for policy 0, policy_version 15059 (0.0022) [2024-03-29 13:49:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 246743040. Throughput: 0: 42279.1. Samples: 128892080. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:49:03,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:49:04,056][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015061_246759424.pth... [2024-03-29 13:49:04,375][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014448_236716032.pth [2024-03-29 13:49:07,568][00497] Updated weights for policy 0, policy_version 15069 (0.0031) [2024-03-29 13:49:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 246939648. Throughput: 0: 42218.1. Samples: 129154060. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:49:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:49:09,985][00476] Signal inference workers to stop experience collection... (4650 times) [2024-03-29 13:49:10,005][00497] InferenceWorker_p0-w0: stopping experience collection (4650 times) [2024-03-29 13:49:10,196][00476] Signal inference workers to resume experience collection... (4650 times) [2024-03-29 13:49:10,197][00497] InferenceWorker_p0-w0: resuming experience collection (4650 times) [2024-03-29 13:49:10,966][00497] Updated weights for policy 0, policy_version 15079 (0.0023) [2024-03-29 13:49:13,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 41820.8). Total num frames: 247185408. Throughput: 0: 41733.3. Samples: 129382880. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 13:49:13,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 13:49:15,228][00497] Updated weights for policy 0, policy_version 15089 (0.0022) [2024-03-29 13:49:18,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 247365632. Throughput: 0: 42150.7. Samples: 129520440. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:49:18,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:49:18,873][00497] Updated weights for policy 0, policy_version 15099 (0.0022) [2024-03-29 13:49:23,229][00497] Updated weights for policy 0, policy_version 15109 (0.0027) [2024-03-29 13:49:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 247578624. Throughput: 0: 41966.1. Samples: 129779000. Policy #0 lag: (min: 0.0, avg: 20.3, max: 40.0) [2024-03-29 13:49:23,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:49:26,360][00497] Updated weights for policy 0, policy_version 15119 (0.0020) [2024-03-29 13:49:28,839][00126] Fps is (10 sec: 45874.5, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 247824384. Throughput: 0: 42034.6. Samples: 130019060. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 13:49:28,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 13:49:30,489][00497] Updated weights for policy 0, policy_version 15129 (0.0021) [2024-03-29 13:49:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 248020992. Throughput: 0: 42052.5. Samples: 130158060. Policy #0 lag: (min: 1.0, avg: 21.1, max: 42.0) [2024-03-29 13:49:33,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 13:49:34,403][00497] Updated weights for policy 0, policy_version 15139 (0.0029) [2024-03-29 13:49:38,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 248184832. Throughput: 0: 41781.8. Samples: 130402020. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 13:49:38,840][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 13:49:38,974][00497] Updated weights for policy 0, policy_version 15149 (0.0032) [2024-03-29 13:49:42,126][00497] Updated weights for policy 0, policy_version 15159 (0.0029) [2024-03-29 13:49:43,470][00476] Signal inference workers to stop experience collection... (4700 times) [2024-03-29 13:49:43,540][00497] InferenceWorker_p0-w0: stopping experience collection (4700 times) [2024-03-29 13:49:43,556][00476] Signal inference workers to resume experience collection... (4700 times) [2024-03-29 13:49:43,573][00497] InferenceWorker_p0-w0: resuming experience collection (4700 times) [2024-03-29 13:49:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 248430592. Throughput: 0: 41986.7. Samples: 130637900. Policy #0 lag: (min: 0.0, avg: 19.9, max: 43.0) [2024-03-29 13:49:43,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 13:49:46,492][00497] Updated weights for policy 0, policy_version 15169 (0.0021) [2024-03-29 13:49:48,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 248627200. Throughput: 0: 41726.5. Samples: 130769780. Policy #0 lag: (min: 2.0, avg: 20.5, max: 43.0) [2024-03-29 13:49:48,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 13:49:50,180][00497] Updated weights for policy 0, policy_version 15179 (0.0030) [2024-03-29 13:49:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 248823808. Throughput: 0: 41606.8. Samples: 131026360. Policy #0 lag: (min: 2.0, avg: 20.5, max: 43.0) [2024-03-29 13:49:53,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:49:54,482][00497] Updated weights for policy 0, policy_version 15189 (0.0023) [2024-03-29 13:49:57,561][00497] Updated weights for policy 0, policy_version 15199 (0.0027) [2024-03-29 13:49:58,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 41876.4). Total num frames: 249069568. Throughput: 0: 42013.9. Samples: 131273500. Policy #0 lag: (min: 2.0, avg: 20.5, max: 43.0) [2024-03-29 13:49:58,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 13:50:02,088][00497] Updated weights for policy 0, policy_version 15209 (0.0023) [2024-03-29 13:50:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 249266176. Throughput: 0: 41875.9. Samples: 131404860. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 13:50:03,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 13:50:05,658][00497] Updated weights for policy 0, policy_version 15219 (0.0027) [2024-03-29 13:50:08,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 249446400. Throughput: 0: 41689.3. Samples: 131655020. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 13:50:08,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:50:09,899][00497] Updated weights for policy 0, policy_version 15229 (0.0026) [2024-03-29 13:50:13,402][00497] Updated weights for policy 0, policy_version 15239 (0.0034) [2024-03-29 13:50:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 249692160. Throughput: 0: 41903.6. Samples: 131904720. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:50:13,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:50:13,894][00476] Signal inference workers to stop experience collection... (4750 times) [2024-03-29 13:50:13,936][00497] InferenceWorker_p0-w0: stopping experience collection (4750 times) [2024-03-29 13:50:14,104][00476] Signal inference workers to resume experience collection... (4750 times) [2024-03-29 13:50:14,104][00497] InferenceWorker_p0-w0: resuming experience collection (4750 times) [2024-03-29 13:50:17,812][00497] Updated weights for policy 0, policy_version 15249 (0.0027) [2024-03-29 13:50:18,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 249872384. Throughput: 0: 41407.6. Samples: 132021400. Policy #0 lag: (min: 1.0, avg: 18.8, max: 41.0) [2024-03-29 13:50:18,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:50:21,236][00497] Updated weights for policy 0, policy_version 15259 (0.0027) [2024-03-29 13:50:23,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 250085376. Throughput: 0: 41623.9. Samples: 132275100. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 13:50:23,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 13:50:25,727][00497] Updated weights for policy 0, policy_version 15269 (0.0024) [2024-03-29 13:50:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 250314752. Throughput: 0: 42300.4. Samples: 132541420. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 13:50:28,841][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:50:29,034][00497] Updated weights for policy 0, policy_version 15279 (0.0032) [2024-03-29 13:50:33,095][00497] Updated weights for policy 0, policy_version 15289 (0.0031) [2024-03-29 13:50:33,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 250511360. Throughput: 0: 42017.1. Samples: 132660540. Policy #0 lag: (min: 0.0, avg: 20.6, max: 40.0) [2024-03-29 13:50:33,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 13:50:36,798][00497] Updated weights for policy 0, policy_version 15299 (0.0023) [2024-03-29 13:50:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 250724352. Throughput: 0: 41780.8. Samples: 132906500. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:50:38,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 13:50:41,499][00497] Updated weights for policy 0, policy_version 15309 (0.0022) [2024-03-29 13:50:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 250937344. Throughput: 0: 42348.0. Samples: 133179160. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 13:50:43,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:50:44,592][00497] Updated weights for policy 0, policy_version 15319 (0.0028) [2024-03-29 13:50:46,133][00476] Signal inference workers to stop experience collection... (4800 times) [2024-03-29 13:50:46,206][00476] Signal inference workers to resume experience collection... (4800 times) [2024-03-29 13:50:46,210][00497] InferenceWorker_p0-w0: stopping experience collection (4800 times) [2024-03-29 13:50:46,233][00497] InferenceWorker_p0-w0: resuming experience collection (4800 times) [2024-03-29 13:50:48,800][00497] Updated weights for policy 0, policy_version 15329 (0.0018) [2024-03-29 13:50:48,840][00126] Fps is (10 sec: 42597.3, 60 sec: 42052.1, 300 sec: 41876.4). Total num frames: 251150336. Throughput: 0: 41789.0. Samples: 133285380. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 13:50:48,841][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:50:52,239][00497] Updated weights for policy 0, policy_version 15339 (0.0031) [2024-03-29 13:50:53,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 251379712. Throughput: 0: 41971.2. Samples: 133543720. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 13:50:53,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:50:56,745][00497] Updated weights for policy 0, policy_version 15349 (0.0026) [2024-03-29 13:50:58,839][00126] Fps is (10 sec: 40961.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 251559936. Throughput: 0: 42467.1. Samples: 133815740. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 13:50:58,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 13:51:00,007][00497] Updated weights for policy 0, policy_version 15359 (0.0020) [2024-03-29 13:51:03,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 251789312. Throughput: 0: 42395.9. Samples: 133929220. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 13:51:03,840][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 13:51:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015368_251789312.pth... [2024-03-29 13:51:04,177][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000014753_241713152.pth [2024-03-29 13:51:04,447][00497] Updated weights for policy 0, policy_version 15369 (0.0019) [2024-03-29 13:51:07,753][00497] Updated weights for policy 0, policy_version 15379 (0.0023) [2024-03-29 13:51:08,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 252018688. Throughput: 0: 42334.3. Samples: 134180140. Policy #0 lag: (min: 0.0, avg: 18.8, max: 42.0) [2024-03-29 13:51:08,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:51:12,216][00497] Updated weights for policy 0, policy_version 15389 (0.0022) [2024-03-29 13:51:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 252182528. Throughput: 0: 42346.1. Samples: 134447000. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 13:51:13,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 13:51:15,866][00497] Updated weights for policy 0, policy_version 15399 (0.0030) [2024-03-29 13:51:17,081][00476] Signal inference workers to stop experience collection... (4850 times) [2024-03-29 13:51:17,083][00476] Signal inference workers to resume experience collection... (4850 times) [2024-03-29 13:51:17,107][00497] InferenceWorker_p0-w0: stopping experience collection (4850 times) [2024-03-29 13:51:17,126][00497] InferenceWorker_p0-w0: resuming experience collection (4850 times) [2024-03-29 13:51:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 252428288. Throughput: 0: 41975.1. Samples: 134549420. Policy #0 lag: (min: 0.0, avg: 22.1, max: 41.0) [2024-03-29 13:51:18,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 13:51:19,974][00497] Updated weights for policy 0, policy_version 15409 (0.0023) [2024-03-29 13:51:23,406][00497] Updated weights for policy 0, policy_version 15419 (0.0029) [2024-03-29 13:51:23,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 252641280. Throughput: 0: 42361.4. Samples: 134812760. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 13:51:23,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 13:51:28,072][00497] Updated weights for policy 0, policy_version 15429 (0.0023) [2024-03-29 13:51:28,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 252805120. Throughput: 0: 42132.4. Samples: 135075120. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 13:51:28,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 13:51:31,437][00497] Updated weights for policy 0, policy_version 15439 (0.0029) [2024-03-29 13:51:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 253067264. Throughput: 0: 42070.9. Samples: 135178560. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:51:33,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 13:51:35,800][00497] Updated weights for policy 0, policy_version 15449 (0.0024) [2024-03-29 13:51:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 253247488. Throughput: 0: 41954.2. Samples: 135431660. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:51:38,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 13:51:39,261][00497] Updated weights for policy 0, policy_version 15459 (0.0022) [2024-03-29 13:51:43,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 253427712. Throughput: 0: 41540.0. Samples: 135685040. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 13:51:43,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 13:51:44,098][00497] Updated weights for policy 0, policy_version 15469 (0.0026) [2024-03-29 13:51:47,310][00497] Updated weights for policy 0, policy_version 15479 (0.0023) [2024-03-29 13:51:48,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.5, 300 sec: 41987.5). Total num frames: 253673472. Throughput: 0: 41794.3. Samples: 135809960. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 13:51:48,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 13:51:49,271][00476] Signal inference workers to stop experience collection... (4900 times) [2024-03-29 13:51:49,323][00497] InferenceWorker_p0-w0: stopping experience collection (4900 times) [2024-03-29 13:51:49,360][00476] Signal inference workers to resume experience collection... (4900 times) [2024-03-29 13:51:49,367][00497] InferenceWorker_p0-w0: resuming experience collection (4900 times) [2024-03-29 13:51:51,948][00497] Updated weights for policy 0, policy_version 15489 (0.0030) [2024-03-29 13:51:53,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 253853696. Throughput: 0: 41683.7. Samples: 136055900. Policy #0 lag: (min: 0.0, avg: 18.6, max: 41.0) [2024-03-29 13:51:53,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:51:55,219][00497] Updated weights for policy 0, policy_version 15499 (0.0024) [2024-03-29 13:51:58,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 254033920. Throughput: 0: 41174.4. Samples: 136299840. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:51:58,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 13:52:00,103][00497] Updated weights for policy 0, policy_version 15509 (0.0026) [2024-03-29 13:52:03,302][00497] Updated weights for policy 0, policy_version 15519 (0.0019) [2024-03-29 13:52:03,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 254279680. Throughput: 0: 41786.2. Samples: 136429800. Policy #0 lag: (min: 1.0, avg: 21.6, max: 42.0) [2024-03-29 13:52:03,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 13:52:07,538][00497] Updated weights for policy 0, policy_version 15529 (0.0019) [2024-03-29 13:52:08,839][00126] Fps is (10 sec: 44235.8, 60 sec: 40959.9, 300 sec: 41931.9). Total num frames: 254476288. Throughput: 0: 41268.7. Samples: 136669860. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 13:52:08,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:52:11,224][00497] Updated weights for policy 0, policy_version 15539 (0.0027) [2024-03-29 13:52:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 254672896. Throughput: 0: 40867.1. Samples: 136914140. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 13:52:13,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 13:52:15,706][00497] Updated weights for policy 0, policy_version 15549 (0.0018) [2024-03-29 13:52:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41233.0, 300 sec: 41820.9). Total num frames: 254902272. Throughput: 0: 41714.7. Samples: 137055720. Policy #0 lag: (min: 1.0, avg: 21.4, max: 45.0) [2024-03-29 13:52:18,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:52:19,119][00497] Updated weights for policy 0, policy_version 15559 (0.0021) [2024-03-29 13:52:19,973][00476] Signal inference workers to stop experience collection... (4950 times) [2024-03-29 13:52:19,994][00497] InferenceWorker_p0-w0: stopping experience collection (4950 times) [2024-03-29 13:52:20,186][00476] Signal inference workers to resume experience collection... (4950 times) [2024-03-29 13:52:20,187][00497] InferenceWorker_p0-w0: resuming experience collection (4950 times) [2024-03-29 13:52:23,104][00497] Updated weights for policy 0, policy_version 15569 (0.0028) [2024-03-29 13:52:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 40959.9, 300 sec: 41876.4). Total num frames: 255098880. Throughput: 0: 41180.8. Samples: 137284800. Policy #0 lag: (min: 1.0, avg: 21.4, max: 45.0) [2024-03-29 13:52:23,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 13:52:27,030][00497] Updated weights for policy 0, policy_version 15579 (0.0025) [2024-03-29 13:52:28,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 255311872. Throughput: 0: 40959.1. Samples: 137528200. Policy #0 lag: (min: 1.0, avg: 21.4, max: 45.0) [2024-03-29 13:52:28,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:52:31,747][00497] Updated weights for policy 0, policy_version 15589 (0.0025) [2024-03-29 13:52:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40687.0, 300 sec: 41765.3). Total num frames: 255508480. Throughput: 0: 41270.6. Samples: 137667140. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:52:33,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 13:52:34,996][00497] Updated weights for policy 0, policy_version 15599 (0.0022) [2024-03-29 13:52:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 255721472. Throughput: 0: 40893.7. Samples: 137896120. Policy #0 lag: (min: 0.0, avg: 19.5, max: 41.0) [2024-03-29 13:52:38,841][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 13:52:38,917][00497] Updated weights for policy 0, policy_version 15609 (0.0033) [2024-03-29 13:52:42,784][00497] Updated weights for policy 0, policy_version 15619 (0.0025) [2024-03-29 13:52:43,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 255950848. Throughput: 0: 41011.9. Samples: 138145380. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:52:43,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:52:47,623][00497] Updated weights for policy 0, policy_version 15629 (0.0020) [2024-03-29 13:52:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 40687.0, 300 sec: 41654.2). Total num frames: 256114688. Throughput: 0: 41336.1. Samples: 138289920. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:52:48,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 13:52:50,688][00497] Updated weights for policy 0, policy_version 15639 (0.0020) [2024-03-29 13:52:52,684][00476] Signal inference workers to stop experience collection... (5000 times) [2024-03-29 13:52:52,685][00476] Signal inference workers to resume experience collection... (5000 times) [2024-03-29 13:52:52,726][00497] InferenceWorker_p0-w0: stopping experience collection (5000 times) [2024-03-29 13:52:52,726][00497] InferenceWorker_p0-w0: resuming experience collection (5000 times) [2024-03-29 13:52:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 256344064. Throughput: 0: 41191.7. Samples: 138523480. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 13:52:53,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:52:54,916][00497] Updated weights for policy 0, policy_version 15649 (0.0020) [2024-03-29 13:52:58,638][00497] Updated weights for policy 0, policy_version 15659 (0.0024) [2024-03-29 13:52:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 256557056. Throughput: 0: 41462.7. Samples: 138779960. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 13:52:58,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 13:53:03,281][00497] Updated weights for policy 0, policy_version 15669 (0.0025) [2024-03-29 13:53:03,839][00126] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 256737280. Throughput: 0: 41009.8. Samples: 138901160. Policy #0 lag: (min: 1.0, avg: 23.8, max: 44.0) [2024-03-29 13:53:03,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:53:03,863][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015670_256737280.pth... [2024-03-29 13:53:04,201][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015061_246759424.pth [2024-03-29 13:53:06,734][00497] Updated weights for policy 0, policy_version 15679 (0.0027) [2024-03-29 13:53:08,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 256983040. Throughput: 0: 41296.4. Samples: 139143140. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 13:53:08,842][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 13:53:10,950][00497] Updated weights for policy 0, policy_version 15689 (0.0026) [2024-03-29 13:53:13,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 257163264. Throughput: 0: 41738.6. Samples: 139406440. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 13:53:13,840][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 13:53:14,524][00497] Updated weights for policy 0, policy_version 15699 (0.0024) [2024-03-29 13:53:18,839][00126] Fps is (10 sec: 37683.6, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 257359872. Throughput: 0: 41176.0. Samples: 139520060. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:53:18,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:53:18,934][00497] Updated weights for policy 0, policy_version 15709 (0.0018) [2024-03-29 13:53:22,419][00497] Updated weights for policy 0, policy_version 15719 (0.0024) [2024-03-29 13:53:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 257622016. Throughput: 0: 41799.0. Samples: 139777080. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:53:23,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 13:53:24,033][00476] Signal inference workers to stop experience collection... (5050 times) [2024-03-29 13:53:24,090][00497] InferenceWorker_p0-w0: stopping experience collection (5050 times) [2024-03-29 13:53:24,124][00476] Signal inference workers to resume experience collection... (5050 times) [2024-03-29 13:53:24,126][00497] InferenceWorker_p0-w0: resuming experience collection (5050 times) [2024-03-29 13:53:26,266][00497] Updated weights for policy 0, policy_version 15729 (0.0034) [2024-03-29 13:53:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 257785856. Throughput: 0: 41890.7. Samples: 140030460. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:53:28,840][00126] Avg episode reward: [(0, '0.464')] [2024-03-29 13:53:30,094][00497] Updated weights for policy 0, policy_version 15739 (0.0030) [2024-03-29 13:53:33,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 257982464. Throughput: 0: 41064.4. Samples: 140137820. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:53:33,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:53:34,780][00497] Updated weights for policy 0, policy_version 15749 (0.0020) [2024-03-29 13:53:38,356][00497] Updated weights for policy 0, policy_version 15759 (0.0033) [2024-03-29 13:53:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 258211840. Throughput: 0: 41583.2. Samples: 140394720. Policy #0 lag: (min: 0.0, avg: 19.1, max: 41.0) [2024-03-29 13:53:38,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 13:53:42,290][00497] Updated weights for policy 0, policy_version 15769 (0.0019) [2024-03-29 13:53:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 258392064. Throughput: 0: 41500.0. Samples: 140647460. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:53:43,841][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 13:53:46,128][00497] Updated weights for policy 0, policy_version 15779 (0.0022) [2024-03-29 13:53:48,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 258605056. Throughput: 0: 41239.1. Samples: 140756920. Policy #0 lag: (min: 1.0, avg: 21.1, max: 43.0) [2024-03-29 13:53:48,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 13:53:50,864][00497] Updated weights for policy 0, policy_version 15789 (0.0022) [2024-03-29 13:53:53,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 258834432. Throughput: 0: 41758.7. Samples: 141022280. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:53:53,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 13:53:54,134][00497] Updated weights for policy 0, policy_version 15799 (0.0026) [2024-03-29 13:53:55,428][00476] Signal inference workers to stop experience collection... (5100 times) [2024-03-29 13:53:55,461][00497] InferenceWorker_p0-w0: stopping experience collection (5100 times) [2024-03-29 13:53:55,641][00476] Signal inference workers to resume experience collection... (5100 times) [2024-03-29 13:53:55,642][00497] InferenceWorker_p0-w0: resuming experience collection (5100 times) [2024-03-29 13:53:58,074][00497] Updated weights for policy 0, policy_version 15809 (0.0018) [2024-03-29 13:53:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 259031040. Throughput: 0: 41456.5. Samples: 141271980. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 13:53:58,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:54:01,720][00497] Updated weights for policy 0, policy_version 15819 (0.0020) [2024-03-29 13:54:03,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 259244032. Throughput: 0: 41744.0. Samples: 141398540. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 13:54:03,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 13:54:06,174][00497] Updated weights for policy 0, policy_version 15829 (0.0021) [2024-03-29 13:54:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 259457024. Throughput: 0: 41906.7. Samples: 141662880. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 13:54:08,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 13:54:09,697][00497] Updated weights for policy 0, policy_version 15839 (0.0022) [2024-03-29 13:54:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 259653632. Throughput: 0: 41468.4. Samples: 141896540. Policy #0 lag: (min: 1.0, avg: 22.8, max: 43.0) [2024-03-29 13:54:13,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 13:54:14,103][00497] Updated weights for policy 0, policy_version 15849 (0.0021) [2024-03-29 13:54:17,580][00497] Updated weights for policy 0, policy_version 15859 (0.0019) [2024-03-29 13:54:18,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 259883008. Throughput: 0: 41936.3. Samples: 142024960. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:54:18,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 13:54:21,715][00497] Updated weights for policy 0, policy_version 15869 (0.0018) [2024-03-29 13:54:23,839][00126] Fps is (10 sec: 42598.7, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 260079616. Throughput: 0: 42042.2. Samples: 142286620. Policy #0 lag: (min: 0.0, avg: 20.7, max: 42.0) [2024-03-29 13:54:23,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 13:54:25,498][00497] Updated weights for policy 0, policy_version 15879 (0.0029) [2024-03-29 13:54:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 260292608. Throughput: 0: 41800.8. Samples: 142528500. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:54:28,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 13:54:29,423][00497] Updated weights for policy 0, policy_version 15889 (0.0034) [2024-03-29 13:54:31,986][00476] Signal inference workers to stop experience collection... (5150 times) [2024-03-29 13:54:32,020][00497] InferenceWorker_p0-w0: stopping experience collection (5150 times) [2024-03-29 13:54:32,199][00476] Signal inference workers to resume experience collection... (5150 times) [2024-03-29 13:54:32,200][00497] InferenceWorker_p0-w0: resuming experience collection (5150 times) [2024-03-29 13:54:33,014][00497] Updated weights for policy 0, policy_version 15899 (0.0024) [2024-03-29 13:54:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 260521984. Throughput: 0: 42199.1. Samples: 142655880. Policy #0 lag: (min: 1.0, avg: 21.0, max: 41.0) [2024-03-29 13:54:33,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 13:54:37,572][00497] Updated weights for policy 0, policy_version 15909 (0.0031) [2024-03-29 13:54:38,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 260702208. Throughput: 0: 41958.7. Samples: 142910420. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:54:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:54:41,205][00497] Updated weights for policy 0, policy_version 15919 (0.0023) [2024-03-29 13:54:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 260915200. Throughput: 0: 41506.2. Samples: 143139760. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:54:43,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:54:45,360][00497] Updated weights for policy 0, policy_version 15929 (0.0023) [2024-03-29 13:54:48,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 261144576. Throughput: 0: 41708.1. Samples: 143275400. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 13:54:48,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 13:54:48,840][00497] Updated weights for policy 0, policy_version 15939 (0.0028) [2024-03-29 13:54:53,482][00497] Updated weights for policy 0, policy_version 15949 (0.0018) [2024-03-29 13:54:53,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.1, 300 sec: 41487.6). Total num frames: 261308416. Throughput: 0: 41475.6. Samples: 143529280. Policy #0 lag: (min: 2.0, avg: 21.9, max: 42.0) [2024-03-29 13:54:53,840][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 13:54:56,871][00497] Updated weights for policy 0, policy_version 15959 (0.0029) [2024-03-29 13:54:58,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 261554176. Throughput: 0: 41502.7. Samples: 143764160. Policy #0 lag: (min: 2.0, avg: 21.9, max: 42.0) [2024-03-29 13:54:58,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:55:00,952][00497] Updated weights for policy 0, policy_version 15969 (0.0022) [2024-03-29 13:55:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 261750784. Throughput: 0: 41746.8. Samples: 143903560. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:55:03,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 13:55:04,187][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015978_261783552.pth... [2024-03-29 13:55:04,521][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015368_251789312.pth [2024-03-29 13:55:04,550][00476] Signal inference workers to stop experience collection... (5200 times) [2024-03-29 13:55:04,590][00497] InferenceWorker_p0-w0: stopping experience collection (5200 times) [2024-03-29 13:55:04,785][00476] Signal inference workers to resume experience collection... (5200 times) [2024-03-29 13:55:04,786][00497] InferenceWorker_p0-w0: resuming experience collection (5200 times) [2024-03-29 13:55:04,790][00497] Updated weights for policy 0, policy_version 15979 (0.0023) [2024-03-29 13:55:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 261947392. Throughput: 0: 41580.3. Samples: 144157740. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 13:55:08,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 13:55:09,025][00497] Updated weights for policy 0, policy_version 15989 (0.0021) [2024-03-29 13:55:12,495][00497] Updated weights for policy 0, policy_version 15999 (0.0024) [2024-03-29 13:55:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 262193152. Throughput: 0: 41526.6. Samples: 144397200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:55:13,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 13:55:16,407][00497] Updated weights for policy 0, policy_version 16009 (0.0026) [2024-03-29 13:55:18,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 262389760. Throughput: 0: 41800.3. Samples: 144536900. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:55:18,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 13:55:19,930][00497] Updated weights for policy 0, policy_version 16019 (0.0034) [2024-03-29 13:55:23,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 262569984. Throughput: 0: 41662.2. Samples: 144785220. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 13:55:23,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 13:55:24,695][00497] Updated weights for policy 0, policy_version 16029 (0.0018) [2024-03-29 13:55:27,983][00497] Updated weights for policy 0, policy_version 16039 (0.0022) [2024-03-29 13:55:28,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 262815744. Throughput: 0: 42198.3. Samples: 145038680. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:55:28,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 13:55:31,995][00497] Updated weights for policy 0, policy_version 16049 (0.0019) [2024-03-29 13:55:33,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 263012352. Throughput: 0: 42057.6. Samples: 145168000. Policy #0 lag: (min: 1.0, avg: 18.9, max: 41.0) [2024-03-29 13:55:33,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 13:55:35,668][00497] Updated weights for policy 0, policy_version 16059 (0.0021) [2024-03-29 13:55:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41598.7). Total num frames: 263208960. Throughput: 0: 41815.6. Samples: 145410980. Policy #0 lag: (min: 2.0, avg: 22.3, max: 41.0) [2024-03-29 13:55:38,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 13:55:40,336][00497] Updated weights for policy 0, policy_version 16069 (0.0021) [2024-03-29 13:55:41,458][00476] Signal inference workers to stop experience collection... (5250 times) [2024-03-29 13:55:41,524][00497] InferenceWorker_p0-w0: stopping experience collection (5250 times) [2024-03-29 13:55:41,620][00476] Signal inference workers to resume experience collection... (5250 times) [2024-03-29 13:55:41,620][00497] InferenceWorker_p0-w0: resuming experience collection (5250 times) [2024-03-29 13:55:43,610][00497] Updated weights for policy 0, policy_version 16079 (0.0029) [2024-03-29 13:55:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41654.3). Total num frames: 263438336. Throughput: 0: 42227.6. Samples: 145664400. Policy #0 lag: (min: 2.0, avg: 22.3, max: 41.0) [2024-03-29 13:55:43,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:55:47,524][00497] Updated weights for policy 0, policy_version 16089 (0.0018) [2024-03-29 13:55:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 41598.7). Total num frames: 263651328. Throughput: 0: 42088.0. Samples: 145797520. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 13:55:48,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 13:55:51,176][00497] Updated weights for policy 0, policy_version 16099 (0.0020) [2024-03-29 13:55:53,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42325.3, 300 sec: 41654.2). Total num frames: 263847936. Throughput: 0: 41996.5. Samples: 146047580. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 13:55:53,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:55:55,551][00497] Updated weights for policy 0, policy_version 16109 (0.0019) [2024-03-29 13:55:58,823][00497] Updated weights for policy 0, policy_version 16119 (0.0022) [2024-03-29 13:55:58,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 264093696. Throughput: 0: 42457.0. Samples: 146307760. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 13:55:58,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:56:02,829][00497] Updated weights for policy 0, policy_version 16129 (0.0022) [2024-03-29 13:56:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.4, 300 sec: 41598.7). Total num frames: 264290304. Throughput: 0: 42220.6. Samples: 146436820. Policy #0 lag: (min: 1.0, avg: 23.1, max: 43.0) [2024-03-29 13:56:03,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 13:56:06,467][00497] Updated weights for policy 0, policy_version 16139 (0.0021) [2024-03-29 13:56:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 264503296. Throughput: 0: 42552.8. Samples: 146700100. Policy #0 lag: (min: 1.0, avg: 23.1, max: 43.0) [2024-03-29 13:56:08,841][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:56:10,695][00497] Updated weights for policy 0, policy_version 16149 (0.0030) [2024-03-29 13:56:13,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 264732672. Throughput: 0: 42527.0. Samples: 146952400. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:56:13,840][00126] Avg episode reward: [(0, '0.293')] [2024-03-29 13:56:14,085][00497] Updated weights for policy 0, policy_version 16159 (0.0019) [2024-03-29 13:56:14,574][00476] Signal inference workers to stop experience collection... (5300 times) [2024-03-29 13:56:14,607][00497] InferenceWorker_p0-w0: stopping experience collection (5300 times) [2024-03-29 13:56:14,767][00476] Signal inference workers to resume experience collection... (5300 times) [2024-03-29 13:56:14,768][00497] InferenceWorker_p0-w0: resuming experience collection (5300 times) [2024-03-29 13:56:18,304][00497] Updated weights for policy 0, policy_version 16169 (0.0027) [2024-03-29 13:56:18,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 264929280. Throughput: 0: 42240.9. Samples: 147068840. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:56:18,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 13:56:21,853][00497] Updated weights for policy 0, policy_version 16179 (0.0018) [2024-03-29 13:56:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42871.5, 300 sec: 41820.8). Total num frames: 265142272. Throughput: 0: 42822.6. Samples: 147338000. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 13:56:23,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 13:56:26,161][00497] Updated weights for policy 0, policy_version 16189 (0.0024) [2024-03-29 13:56:28,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.4, 300 sec: 41709.8). Total num frames: 265371648. Throughput: 0: 42874.6. Samples: 147593760. Policy #0 lag: (min: 0.0, avg: 20.7, max: 43.0) [2024-03-29 13:56:28,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 13:56:29,414][00497] Updated weights for policy 0, policy_version 16199 (0.0031) [2024-03-29 13:56:33,556][00497] Updated weights for policy 0, policy_version 16209 (0.0019) [2024-03-29 13:56:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42871.4, 300 sec: 41820.8). Total num frames: 265584640. Throughput: 0: 42555.9. Samples: 147712540. Policy #0 lag: (min: 0.0, avg: 20.7, max: 43.0) [2024-03-29 13:56:33,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 13:56:37,291][00497] Updated weights for policy 0, policy_version 16219 (0.0027) [2024-03-29 13:56:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 41931.9). Total num frames: 265797632. Throughput: 0: 42739.6. Samples: 147970860. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 13:56:38,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 13:56:41,695][00497] Updated weights for policy 0, policy_version 16229 (0.0023) [2024-03-29 13:56:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.3, 300 sec: 41765.3). Total num frames: 265994240. Throughput: 0: 42810.2. Samples: 148234220. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 13:56:43,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:56:44,875][00497] Updated weights for policy 0, policy_version 16239 (0.0042) [2024-03-29 13:56:48,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 266207232. Throughput: 0: 42394.6. Samples: 148344580. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:56:48,840][00126] Avg episode reward: [(0, '0.470')] [2024-03-29 13:56:49,003][00497] Updated weights for policy 0, policy_version 16249 (0.0017) [2024-03-29 13:56:51,690][00476] Signal inference workers to stop experience collection... (5350 times) [2024-03-29 13:56:51,772][00497] InferenceWorker_p0-w0: stopping experience collection (5350 times) [2024-03-29 13:56:51,775][00476] Signal inference workers to resume experience collection... (5350 times) [2024-03-29 13:56:51,798][00497] InferenceWorker_p0-w0: resuming experience collection (5350 times) [2024-03-29 13:56:52,687][00497] Updated weights for policy 0, policy_version 16259 (0.0018) [2024-03-29 13:56:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 43144.6, 300 sec: 42043.0). Total num frames: 266436608. Throughput: 0: 42610.4. Samples: 148617560. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:56:53,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 13:56:56,831][00497] Updated weights for policy 0, policy_version 16269 (0.0023) [2024-03-29 13:56:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.2, 300 sec: 41876.4). Total num frames: 266633216. Throughput: 0: 42832.9. Samples: 148879880. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:56:58,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 13:57:00,212][00497] Updated weights for policy 0, policy_version 16279 (0.0034) [2024-03-29 13:57:03,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.4, 300 sec: 41932.0). Total num frames: 266846208. Throughput: 0: 42513.3. Samples: 148981940. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 13:57:03,841][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 13:57:03,860][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016287_266846208.pth... [2024-03-29 13:57:04,196][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015670_256737280.pth [2024-03-29 13:57:04,723][00497] Updated weights for policy 0, policy_version 16289 (0.0019) [2024-03-29 13:57:08,290][00497] Updated weights for policy 0, policy_version 16299 (0.0020) [2024-03-29 13:57:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 267059200. Throughput: 0: 42432.0. Samples: 149247440. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 13:57:08,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 13:57:12,401][00497] Updated weights for policy 0, policy_version 16309 (0.0031) [2024-03-29 13:57:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 267255808. Throughput: 0: 42765.3. Samples: 149518200. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:57:13,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 13:57:15,810][00497] Updated weights for policy 0, policy_version 16319 (0.0025) [2024-03-29 13:57:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 267485184. Throughput: 0: 42308.0. Samples: 149616400. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:57:18,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 13:57:20,159][00497] Updated weights for policy 0, policy_version 16329 (0.0022) [2024-03-29 13:57:23,827][00497] Updated weights for policy 0, policy_version 16339 (0.0019) [2024-03-29 13:57:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 267698176. Throughput: 0: 42561.8. Samples: 149886140. Policy #0 lag: (min: 0.0, avg: 18.9, max: 40.0) [2024-03-29 13:57:23,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 13:57:27,926][00497] Updated weights for policy 0, policy_version 16349 (0.0022) [2024-03-29 13:57:28,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 267878400. Throughput: 0: 42497.4. Samples: 150146600. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:57:28,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:57:29,693][00476] Signal inference workers to stop experience collection... (5400 times) [2024-03-29 13:57:29,763][00497] InferenceWorker_p0-w0: stopping experience collection (5400 times) [2024-03-29 13:57:29,777][00476] Signal inference workers to resume experience collection... (5400 times) [2024-03-29 13:57:29,795][00497] InferenceWorker_p0-w0: resuming experience collection (5400 times) [2024-03-29 13:57:31,220][00497] Updated weights for policy 0, policy_version 16359 (0.0023) [2024-03-29 13:57:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 268124160. Throughput: 0: 42625.4. Samples: 150262720. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 13:57:33,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 13:57:35,560][00497] Updated weights for policy 0, policy_version 16369 (0.0026) [2024-03-29 13:57:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 268337152. Throughput: 0: 42250.6. Samples: 150518840. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:57:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 13:57:39,058][00497] Updated weights for policy 0, policy_version 16379 (0.0023) [2024-03-29 13:57:43,426][00497] Updated weights for policy 0, policy_version 16389 (0.0026) [2024-03-29 13:57:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 268517376. Throughput: 0: 42230.8. Samples: 150780260. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 13:57:43,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 13:57:46,733][00497] Updated weights for policy 0, policy_version 16399 (0.0024) [2024-03-29 13:57:48,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 268763136. Throughput: 0: 42523.4. Samples: 150895500. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:57:48,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 13:57:51,116][00497] Updated weights for policy 0, policy_version 16409 (0.0017) [2024-03-29 13:57:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 268959744. Throughput: 0: 42286.8. Samples: 151150340. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:57:53,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 13:57:54,564][00497] Updated weights for policy 0, policy_version 16419 (0.0018) [2024-03-29 13:57:58,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 269156352. Throughput: 0: 42028.1. Samples: 151409460. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 13:57:58,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:57:58,913][00497] Updated weights for policy 0, policy_version 16429 (0.0036) [2024-03-29 13:58:02,288][00476] Signal inference workers to stop experience collection... (5450 times) [2024-03-29 13:58:02,339][00497] InferenceWorker_p0-w0: stopping experience collection (5450 times) [2024-03-29 13:58:02,369][00476] Signal inference workers to resume experience collection... (5450 times) [2024-03-29 13:58:02,372][00497] InferenceWorker_p0-w0: resuming experience collection (5450 times) [2024-03-29 13:58:02,378][00497] Updated weights for policy 0, policy_version 16439 (0.0027) [2024-03-29 13:58:03,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 269402112. Throughput: 0: 42601.3. Samples: 151533460. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 13:58:03,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 13:58:06,542][00497] Updated weights for policy 0, policy_version 16449 (0.0018) [2024-03-29 13:58:08,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 269582336. Throughput: 0: 42197.8. Samples: 151785040. Policy #0 lag: (min: 1.0, avg: 19.2, max: 42.0) [2024-03-29 13:58:08,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 13:58:10,252][00497] Updated weights for policy 0, policy_version 16459 (0.0028) [2024-03-29 13:58:13,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 269795328. Throughput: 0: 42012.9. Samples: 152037180. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:58:13,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 13:58:14,682][00497] Updated weights for policy 0, policy_version 16469 (0.0027) [2024-03-29 13:58:17,972][00497] Updated weights for policy 0, policy_version 16479 (0.0028) [2024-03-29 13:58:18,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 270041088. Throughput: 0: 42295.6. Samples: 152166020. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:58:18,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 13:58:22,203][00497] Updated weights for policy 0, policy_version 16489 (0.0018) [2024-03-29 13:58:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 270221312. Throughput: 0: 42104.9. Samples: 152413560. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 13:58:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:58:25,896][00497] Updated weights for policy 0, policy_version 16499 (0.0023) [2024-03-29 13:58:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 270417920. Throughput: 0: 42048.1. Samples: 152672420. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:58:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 13:58:30,353][00497] Updated weights for policy 0, policy_version 16509 (0.0019) [2024-03-29 13:58:33,355][00497] Updated weights for policy 0, policy_version 16519 (0.0024) [2024-03-29 13:58:33,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 270663680. Throughput: 0: 42409.4. Samples: 152803920. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 13:58:33,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 13:58:37,399][00497] Updated weights for policy 0, policy_version 16529 (0.0017) [2024-03-29 13:58:38,839][00126] Fps is (10 sec: 45874.0, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 270876672. Throughput: 0: 42408.7. Samples: 153058740. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 13:58:39,959][00476] Signal inference workers to stop experience collection... (5500 times) [2024-03-29 13:58:39,989][00497] InferenceWorker_p0-w0: stopping experience collection (5500 times) [2024-03-29 13:58:40,171][00476] Signal inference workers to resume experience collection... (5500 times) [2024-03-29 13:58:40,171][00497] InferenceWorker_p0-w0: resuming experience collection (5500 times) [2024-03-29 13:58:41,182][00497] Updated weights for policy 0, policy_version 16539 (0.0019) [2024-03-29 13:58:43,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42598.3, 300 sec: 42265.1). Total num frames: 271073280. Throughput: 0: 42287.0. Samples: 153312380. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:43,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:58:45,532][00497] Updated weights for policy 0, policy_version 16549 (0.0017) [2024-03-29 13:58:48,638][00497] Updated weights for policy 0, policy_version 16559 (0.0023) [2024-03-29 13:58:48,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42325.5, 300 sec: 42265.2). Total num frames: 271302656. Throughput: 0: 42531.2. Samples: 153447360. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:48,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 13:58:52,884][00497] Updated weights for policy 0, policy_version 16569 (0.0032) [2024-03-29 13:58:53,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 271499264. Throughput: 0: 42358.2. Samples: 153691160. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:53,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 13:58:56,468][00497] Updated weights for policy 0, policy_version 16579 (0.0020) [2024-03-29 13:58:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 271712256. Throughput: 0: 42493.3. Samples: 153949380. Policy #0 lag: (min: 1.0, avg: 21.6, max: 41.0) [2024-03-29 13:58:58,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 13:59:00,663][00497] Updated weights for policy 0, policy_version 16589 (0.0021) [2024-03-29 13:59:03,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 271941632. Throughput: 0: 42707.5. Samples: 154087860. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 13:59:03,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 13:59:04,122][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016599_271958016.pth... [2024-03-29 13:59:04,137][00497] Updated weights for policy 0, policy_version 16599 (0.0027) [2024-03-29 13:59:04,469][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000015978_261783552.pth [2024-03-29 13:59:08,263][00497] Updated weights for policy 0, policy_version 16609 (0.0024) [2024-03-29 13:59:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 272138240. Throughput: 0: 42427.2. Samples: 154322780. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 13:59:08,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 13:59:12,034][00497] Updated weights for policy 0, policy_version 16619 (0.0024) [2024-03-29 13:59:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 272351232. Throughput: 0: 42439.5. Samples: 154582200. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:59:13,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 13:59:16,227][00497] Updated weights for policy 0, policy_version 16629 (0.0023) [2024-03-29 13:59:18,074][00476] Signal inference workers to stop experience collection... (5550 times) [2024-03-29 13:59:18,105][00497] InferenceWorker_p0-w0: stopping experience collection (5550 times) [2024-03-29 13:59:18,303][00476] Signal inference workers to resume experience collection... (5550 times) [2024-03-29 13:59:18,303][00497] InferenceWorker_p0-w0: resuming experience collection (5550 times) [2024-03-29 13:59:18,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 42376.2). Total num frames: 272580608. Throughput: 0: 42551.1. Samples: 154718720. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:59:18,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 13:59:19,594][00497] Updated weights for policy 0, policy_version 16639 (0.0034) [2024-03-29 13:59:23,670][00497] Updated weights for policy 0, policy_version 16649 (0.0018) [2024-03-29 13:59:23,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 272777216. Throughput: 0: 42244.0. Samples: 154959720. Policy #0 lag: (min: 0.0, avg: 21.3, max: 43.0) [2024-03-29 13:59:23,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 13:59:27,527][00497] Updated weights for policy 0, policy_version 16659 (0.0024) [2024-03-29 13:59:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 272990208. Throughput: 0: 42386.8. Samples: 155219780. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:59:28,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 13:59:31,767][00497] Updated weights for policy 0, policy_version 16669 (0.0023) [2024-03-29 13:59:33,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 273203200. Throughput: 0: 42385.7. Samples: 155354720. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 13:59:33,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 13:59:34,834][00497] Updated weights for policy 0, policy_version 16679 (0.0021) [2024-03-29 13:59:38,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 273416192. Throughput: 0: 42323.0. Samples: 155595700. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:59:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 13:59:38,920][00497] Updated weights for policy 0, policy_version 16689 (0.0030) [2024-03-29 13:59:43,090][00497] Updated weights for policy 0, policy_version 16699 (0.0041) [2024-03-29 13:59:43,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 273629184. Throughput: 0: 42275.1. Samples: 155851760. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:59:43,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 13:59:47,213][00497] Updated weights for policy 0, policy_version 16709 (0.0024) [2024-03-29 13:59:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 273825792. Throughput: 0: 42185.8. Samples: 155986220. Policy #0 lag: (min: 0.0, avg: 22.1, max: 42.0) [2024-03-29 13:59:48,840][00126] Avg episode reward: [(0, '0.316')] [2024-03-29 13:59:50,540][00497] Updated weights for policy 0, policy_version 16719 (0.0026) [2024-03-29 13:59:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 274055168. Throughput: 0: 42395.2. Samples: 156230560. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 13:59:53,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 13:59:54,595][00497] Updated weights for policy 0, policy_version 16729 (0.0028) [2024-03-29 13:59:55,051][00476] Signal inference workers to stop experience collection... (5600 times) [2024-03-29 13:59:55,090][00497] InferenceWorker_p0-w0: stopping experience collection (5600 times) [2024-03-29 13:59:55,277][00476] Signal inference workers to resume experience collection... (5600 times) [2024-03-29 13:59:55,277][00497] InferenceWorker_p0-w0: resuming experience collection (5600 times) [2024-03-29 13:59:58,520][00497] Updated weights for policy 0, policy_version 16739 (0.0020) [2024-03-29 13:59:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 274268160. Throughput: 0: 42346.2. Samples: 156487780. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 13:59:58,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:00:02,767][00497] Updated weights for policy 0, policy_version 16749 (0.0021) [2024-03-29 14:00:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.2, 300 sec: 42376.3). Total num frames: 274448384. Throughput: 0: 41944.1. Samples: 156606200. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 14:00:03,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 14:00:06,148][00497] Updated weights for policy 0, policy_version 16759 (0.0018) [2024-03-29 14:00:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 274694144. Throughput: 0: 42283.6. Samples: 156862480. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 14:00:08,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:00:10,221][00497] Updated weights for policy 0, policy_version 16769 (0.0019) [2024-03-29 14:00:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 274890752. Throughput: 0: 42411.1. Samples: 157128280. Policy #0 lag: (min: 0.0, avg: 20.1, max: 41.0) [2024-03-29 14:00:13,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:00:14,063][00497] Updated weights for policy 0, policy_version 16779 (0.0024) [2024-03-29 14:00:18,230][00497] Updated weights for policy 0, policy_version 16789 (0.0022) [2024-03-29 14:00:18,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42431.8). Total num frames: 275087360. Throughput: 0: 42243.1. Samples: 157255660. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:00:18,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:00:21,362][00497] Updated weights for policy 0, policy_version 16799 (0.0036) [2024-03-29 14:00:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42871.6, 300 sec: 42487.3). Total num frames: 275349504. Throughput: 0: 42469.9. Samples: 157506840. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:00:23,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:00:25,520][00497] Updated weights for policy 0, policy_version 16809 (0.0024) [2024-03-29 14:00:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.2, 300 sec: 42431.8). Total num frames: 275529728. Throughput: 0: 42536.3. Samples: 157765900. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 14:00:28,841][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 14:00:29,331][00497] Updated weights for policy 0, policy_version 16819 (0.0022) [2024-03-29 14:00:30,127][00476] Signal inference workers to stop experience collection... (5650 times) [2024-03-29 14:00:30,183][00497] InferenceWorker_p0-w0: stopping experience collection (5650 times) [2024-03-29 14:00:30,221][00476] Signal inference workers to resume experience collection... (5650 times) [2024-03-29 14:00:30,223][00497] InferenceWorker_p0-w0: resuming experience collection (5650 times) [2024-03-29 14:00:33,620][00497] Updated weights for policy 0, policy_version 16829 (0.0017) [2024-03-29 14:00:33,839][00126] Fps is (10 sec: 37682.7, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 275726336. Throughput: 0: 42370.1. Samples: 157892880. Policy #0 lag: (min: 1.0, avg: 21.2, max: 42.0) [2024-03-29 14:00:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:00:36,815][00497] Updated weights for policy 0, policy_version 16839 (0.0022) [2024-03-29 14:00:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 275955712. Throughput: 0: 42262.5. Samples: 158132380. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 14:00:38,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 14:00:40,942][00497] Updated weights for policy 0, policy_version 16849 (0.0034) [2024-03-29 14:00:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 276152320. Throughput: 0: 42456.9. Samples: 158398340. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 14:00:43,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:00:45,022][00497] Updated weights for policy 0, policy_version 16859 (0.0021) [2024-03-29 14:00:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 276365312. Throughput: 0: 42380.5. Samples: 158513320. Policy #0 lag: (min: 1.0, avg: 22.1, max: 42.0) [2024-03-29 14:00:48,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:00:49,065][00497] Updated weights for policy 0, policy_version 16869 (0.0018) [2024-03-29 14:00:52,473][00497] Updated weights for policy 0, policy_version 16879 (0.0018) [2024-03-29 14:00:53,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 42376.2). Total num frames: 276594688. Throughput: 0: 42429.8. Samples: 158771820. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 14:00:53,840][00126] Avg episode reward: [(0, '0.421')] [2024-03-29 14:00:56,440][00497] Updated weights for policy 0, policy_version 16889 (0.0021) [2024-03-29 14:00:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 276774912. Throughput: 0: 42225.3. Samples: 159028420. Policy #0 lag: (min: 0.0, avg: 20.2, max: 42.0) [2024-03-29 14:00:58,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 14:01:00,399][00497] Updated weights for policy 0, policy_version 16899 (0.0030) [2024-03-29 14:01:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42598.4, 300 sec: 42376.3). Total num frames: 277004288. Throughput: 0: 42148.9. Samples: 159152360. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 14:01:03,840][00126] Avg episode reward: [(0, '0.338')] [2024-03-29 14:01:04,105][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016908_277020672.pth... [2024-03-29 14:01:04,451][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016287_266846208.pth [2024-03-29 14:01:04,726][00497] Updated weights for policy 0, policy_version 16909 (0.0031) [2024-03-29 14:01:08,047][00497] Updated weights for policy 0, policy_version 16919 (0.0023) [2024-03-29 14:01:08,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 277233664. Throughput: 0: 42328.3. Samples: 159411620. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 14:01:08,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:01:12,291][00497] Updated weights for policy 0, policy_version 16929 (0.0017) [2024-03-29 14:01:12,470][00476] Signal inference workers to stop experience collection... (5700 times) [2024-03-29 14:01:12,495][00497] InferenceWorker_p0-w0: stopping experience collection (5700 times) [2024-03-29 14:01:12,690][00476] Signal inference workers to resume experience collection... (5700 times) [2024-03-29 14:01:12,691][00497] InferenceWorker_p0-w0: resuming experience collection (5700 times) [2024-03-29 14:01:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 277413888. Throughput: 0: 41967.2. Samples: 159654420. Policy #0 lag: (min: 1.0, avg: 20.6, max: 42.0) [2024-03-29 14:01:13,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:01:16,024][00497] Updated weights for policy 0, policy_version 16939 (0.0031) [2024-03-29 14:01:18,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 277643264. Throughput: 0: 42082.8. Samples: 159786600. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:01:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:01:20,150][00497] Updated weights for policy 0, policy_version 16949 (0.0025) [2024-03-29 14:01:23,582][00497] Updated weights for policy 0, policy_version 16959 (0.0028) [2024-03-29 14:01:23,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.1, 300 sec: 42320.7). Total num frames: 277856256. Throughput: 0: 42524.9. Samples: 160046000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:01:23,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 14:01:27,788][00497] Updated weights for policy 0, policy_version 16969 (0.0019) [2024-03-29 14:01:28,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 278069248. Throughput: 0: 42175.6. Samples: 160296240. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 14:01:28,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:01:31,527][00497] Updated weights for policy 0, policy_version 16979 (0.0018) [2024-03-29 14:01:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 278265856. Throughput: 0: 42367.5. Samples: 160419860. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 14:01:33,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:01:35,645][00497] Updated weights for policy 0, policy_version 16989 (0.0023) [2024-03-29 14:01:38,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 278495232. Throughput: 0: 42425.5. Samples: 160680960. Policy #0 lag: (min: 1.0, avg: 21.1, max: 41.0) [2024-03-29 14:01:38,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:01:38,983][00497] Updated weights for policy 0, policy_version 16999 (0.0024) [2024-03-29 14:01:43,308][00497] Updated weights for policy 0, policy_version 17009 (0.0023) [2024-03-29 14:01:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 278691840. Throughput: 0: 42114.1. Samples: 160923560. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:01:43,841][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 14:01:47,137][00497] Updated weights for policy 0, policy_version 17019 (0.0024) [2024-03-29 14:01:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 278904832. Throughput: 0: 42320.0. Samples: 161056760. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:01:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:01:50,943][00476] Signal inference workers to stop experience collection... (5750 times) [2024-03-29 14:01:50,983][00497] InferenceWorker_p0-w0: stopping experience collection (5750 times) [2024-03-29 14:01:51,024][00476] Signal inference workers to resume experience collection... (5750 times) [2024-03-29 14:01:51,024][00497] InferenceWorker_p0-w0: resuming experience collection (5750 times) [2024-03-29 14:01:51,030][00497] Updated weights for policy 0, policy_version 17029 (0.0031) [2024-03-29 14:01:53,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 279134208. Throughput: 0: 42288.9. Samples: 161314620. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 14:01:53,841][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:01:54,259][00497] Updated weights for policy 0, policy_version 17039 (0.0028) [2024-03-29 14:01:58,719][00497] Updated weights for policy 0, policy_version 17049 (0.0018) [2024-03-29 14:01:58,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 279330816. Throughput: 0: 42460.8. Samples: 161565160. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 14:01:58,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:02:02,448][00497] Updated weights for policy 0, policy_version 17059 (0.0018) [2024-03-29 14:02:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 279543808. Throughput: 0: 42420.8. Samples: 161695540. Policy #0 lag: (min: 0.0, avg: 20.0, max: 42.0) [2024-03-29 14:02:03,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:02:06,348][00497] Updated weights for policy 0, policy_version 17069 (0.0020) [2024-03-29 14:02:08,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 279773184. Throughput: 0: 42416.5. Samples: 161954740. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:02:08,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 14:02:09,684][00497] Updated weights for policy 0, policy_version 17079 (0.0021) [2024-03-29 14:02:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 279969792. Throughput: 0: 42434.1. Samples: 162205780. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:02:13,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 14:02:14,123][00497] Updated weights for policy 0, policy_version 17089 (0.0025) [2024-03-29 14:02:17,822][00497] Updated weights for policy 0, policy_version 17099 (0.0034) [2024-03-29 14:02:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 280182784. Throughput: 0: 42613.4. Samples: 162337460. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:02:18,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:02:21,907][00497] Updated weights for policy 0, policy_version 17109 (0.0022) [2024-03-29 14:02:23,839][00126] Fps is (10 sec: 40961.0, 60 sec: 42052.4, 300 sec: 42376.3). Total num frames: 280379392. Throughput: 0: 42466.8. Samples: 162591960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:02:23,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:02:25,238][00497] Updated weights for policy 0, policy_version 17119 (0.0021) [2024-03-29 14:02:26,218][00476] Signal inference workers to stop experience collection... (5800 times) [2024-03-29 14:02:26,271][00497] InferenceWorker_p0-w0: stopping experience collection (5800 times) [2024-03-29 14:02:26,305][00476] Signal inference workers to resume experience collection... (5800 times) [2024-03-29 14:02:26,307][00497] InferenceWorker_p0-w0: resuming experience collection (5800 times) [2024-03-29 14:02:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 280608768. Throughput: 0: 42505.0. Samples: 162836280. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:02:28,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:02:29,499][00497] Updated weights for policy 0, policy_version 17129 (0.0024) [2024-03-29 14:02:33,223][00497] Updated weights for policy 0, policy_version 17139 (0.0028) [2024-03-29 14:02:33,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42598.3, 300 sec: 42320.7). Total num frames: 280821760. Throughput: 0: 42618.2. Samples: 162974580. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 14:02:33,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 14:02:37,315][00497] Updated weights for policy 0, policy_version 17149 (0.0019) [2024-03-29 14:02:38,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 42431.8). Total num frames: 281034752. Throughput: 0: 42506.2. Samples: 163227400. Policy #0 lag: (min: 0.0, avg: 20.9, max: 42.0) [2024-03-29 14:02:38,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 14:02:40,539][00497] Updated weights for policy 0, policy_version 17159 (0.0026) [2024-03-29 14:02:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 281247744. Throughput: 0: 42175.6. Samples: 163463060. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 14:02:43,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:02:45,178][00497] Updated weights for policy 0, policy_version 17169 (0.0022) [2024-03-29 14:02:48,815][00497] Updated weights for policy 0, policy_version 17179 (0.0023) [2024-03-29 14:02:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 281460736. Throughput: 0: 42280.9. Samples: 163598180. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 14:02:48,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:02:52,655][00497] Updated weights for policy 0, policy_version 17189 (0.0024) [2024-03-29 14:02:53,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 281673728. Throughput: 0: 42364.9. Samples: 163861160. Policy #0 lag: (min: 0.0, avg: 21.9, max: 42.0) [2024-03-29 14:02:53,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 14:02:55,941][00497] Updated weights for policy 0, policy_version 17199 (0.0025) [2024-03-29 14:02:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 281886720. Throughput: 0: 42146.7. Samples: 164102380. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:02:58,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:03:00,549][00497] Updated weights for policy 0, policy_version 17209 (0.0019) [2024-03-29 14:03:03,478][00476] Signal inference workers to stop experience collection... (5850 times) [2024-03-29 14:03:03,513][00497] InferenceWorker_p0-w0: stopping experience collection (5850 times) [2024-03-29 14:03:03,691][00476] Signal inference workers to resume experience collection... (5850 times) [2024-03-29 14:03:03,691][00497] InferenceWorker_p0-w0: resuming experience collection (5850 times) [2024-03-29 14:03:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 282083328. Throughput: 0: 42234.2. Samples: 164238000. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:03:03,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:03:03,982][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017218_282099712.pth... [2024-03-29 14:03:04,314][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016599_271958016.pth [2024-03-29 14:03:04,592][00497] Updated weights for policy 0, policy_version 17219 (0.0022) [2024-03-29 14:03:08,317][00497] Updated weights for policy 0, policy_version 17229 (0.0019) [2024-03-29 14:03:08,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 282296320. Throughput: 0: 42153.6. Samples: 164488880. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 14:03:08,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 14:03:11,753][00497] Updated weights for policy 0, policy_version 17239 (0.0032) [2024-03-29 14:03:13,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 42265.2). Total num frames: 282509312. Throughput: 0: 41980.4. Samples: 164725400. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 14:03:13,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 14:03:16,122][00497] Updated weights for policy 0, policy_version 17249 (0.0025) [2024-03-29 14:03:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 282705920. Throughput: 0: 41957.0. Samples: 164862640. Policy #0 lag: (min: 1.0, avg: 20.2, max: 41.0) [2024-03-29 14:03:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:03:19,949][00497] Updated weights for policy 0, policy_version 17259 (0.0021) [2024-03-29 14:03:23,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 282935296. Throughput: 0: 42332.6. Samples: 165132360. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:03:23,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:03:23,857][00497] Updated weights for policy 0, policy_version 17269 (0.0016) [2024-03-29 14:03:27,113][00497] Updated weights for policy 0, policy_version 17279 (0.0020) [2024-03-29 14:03:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 283148288. Throughput: 0: 42205.5. Samples: 165362300. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:03:28,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 14:03:31,603][00497] Updated weights for policy 0, policy_version 17289 (0.0018) [2024-03-29 14:03:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 283344896. Throughput: 0: 42153.5. Samples: 165495080. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:03:33,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:03:35,683][00497] Updated weights for policy 0, policy_version 17299 (0.0027) [2024-03-29 14:03:38,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 42376.3). Total num frames: 283574272. Throughput: 0: 42079.1. Samples: 165754720. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:03:38,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:03:39,356][00497] Updated weights for policy 0, policy_version 17309 (0.0023) [2024-03-29 14:03:41,999][00476] Signal inference workers to stop experience collection... (5900 times) [2024-03-29 14:03:42,019][00497] InferenceWorker_p0-w0: stopping experience collection (5900 times) [2024-03-29 14:03:42,209][00476] Signal inference workers to resume experience collection... (5900 times) [2024-03-29 14:03:42,210][00497] InferenceWorker_p0-w0: resuming experience collection (5900 times) [2024-03-29 14:03:42,712][00497] Updated weights for policy 0, policy_version 17319 (0.0024) [2024-03-29 14:03:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.4, 300 sec: 42265.2). Total num frames: 283770880. Throughput: 0: 41713.9. Samples: 165979500. Policy #0 lag: (min: 0.0, avg: 20.3, max: 41.0) [2024-03-29 14:03:43,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:03:47,242][00497] Updated weights for policy 0, policy_version 17329 (0.0029) [2024-03-29 14:03:48,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 283983872. Throughput: 0: 41912.8. Samples: 166124080. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:03:48,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:03:51,330][00497] Updated weights for policy 0, policy_version 17339 (0.0023) [2024-03-29 14:03:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 284180480. Throughput: 0: 41985.5. Samples: 166378220. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:03:53,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:03:54,871][00497] Updated weights for policy 0, policy_version 17349 (0.0027) [2024-03-29 14:03:58,253][00497] Updated weights for policy 0, policy_version 17359 (0.0026) [2024-03-29 14:03:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 284426240. Throughput: 0: 42215.4. Samples: 166625100. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:03:58,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 14:04:02,843][00497] Updated weights for policy 0, policy_version 17369 (0.0021) [2024-03-29 14:04:03,839][00126] Fps is (10 sec: 44235.8, 60 sec: 42325.2, 300 sec: 42320.7). Total num frames: 284622848. Throughput: 0: 42150.1. Samples: 166759400. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:04:03,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:04:06,731][00497] Updated weights for policy 0, policy_version 17379 (0.0024) [2024-03-29 14:04:08,839][00126] Fps is (10 sec: 39322.3, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 284819456. Throughput: 0: 41833.3. Samples: 167014860. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:04:08,840][00126] Avg episode reward: [(0, '0.545')] [2024-03-29 14:04:09,099][00476] Saving new best policy, reward=0.545! [2024-03-29 14:04:10,460][00497] Updated weights for policy 0, policy_version 17389 (0.0019) [2024-03-29 14:04:13,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 285048832. Throughput: 0: 42168.0. Samples: 167259860. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:13,841][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 14:04:14,074][00497] Updated weights for policy 0, policy_version 17399 (0.0032) [2024-03-29 14:04:18,470][00497] Updated weights for policy 0, policy_version 17409 (0.0021) [2024-03-29 14:04:18,512][00476] Signal inference workers to stop experience collection... (5950 times) [2024-03-29 14:04:18,533][00497] InferenceWorker_p0-w0: stopping experience collection (5950 times) [2024-03-29 14:04:18,717][00476] Signal inference workers to resume experience collection... (5950 times) [2024-03-29 14:04:18,717][00497] InferenceWorker_p0-w0: resuming experience collection (5950 times) [2024-03-29 14:04:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 285245440. Throughput: 0: 42137.6. Samples: 167391280. Policy #0 lag: (min: 2.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:18,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:04:22,361][00497] Updated weights for policy 0, policy_version 17419 (0.0024) [2024-03-29 14:04:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 285442048. Throughput: 0: 42157.3. Samples: 167651800. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:04:23,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:04:26,242][00497] Updated weights for policy 0, policy_version 17429 (0.0021) [2024-03-29 14:04:28,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42265.2). Total num frames: 285671424. Throughput: 0: 42588.8. Samples: 167896000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:04:28,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:04:29,562][00497] Updated weights for policy 0, policy_version 17439 (0.0022) [2024-03-29 14:04:33,775][00497] Updated weights for policy 0, policy_version 17449 (0.0020) [2024-03-29 14:04:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 42265.2). Total num frames: 285884416. Throughput: 0: 42131.2. Samples: 168019980. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:04:33,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:04:37,893][00497] Updated weights for policy 0, policy_version 17459 (0.0019) [2024-03-29 14:04:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 286081024. Throughput: 0: 42475.8. Samples: 168289640. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:38,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:04:41,451][00497] Updated weights for policy 0, policy_version 17469 (0.0023) [2024-03-29 14:04:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 286310400. Throughput: 0: 42449.5. Samples: 168535320. Policy #0 lag: (min: 0.0, avg: 21.3, max: 42.0) [2024-03-29 14:04:43,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 14:04:45,156][00497] Updated weights for policy 0, policy_version 17479 (0.0020) [2024-03-29 14:04:48,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 286507008. Throughput: 0: 41924.5. Samples: 168646000. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:04:48,842][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:04:49,591][00497] Updated weights for policy 0, policy_version 17489 (0.0029) [2024-03-29 14:04:52,881][00476] Signal inference workers to stop experience collection... (6000 times) [2024-03-29 14:04:52,955][00497] InferenceWorker_p0-w0: stopping experience collection (6000 times) [2024-03-29 14:04:52,959][00476] Signal inference workers to resume experience collection... (6000 times) [2024-03-29 14:04:52,981][00497] InferenceWorker_p0-w0: resuming experience collection (6000 times) [2024-03-29 14:04:53,558][00497] Updated weights for policy 0, policy_version 17499 (0.0019) [2024-03-29 14:04:53,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42052.1, 300 sec: 42154.1). Total num frames: 286703616. Throughput: 0: 42257.6. Samples: 168916460. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:04:53,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 14:04:57,035][00497] Updated weights for policy 0, policy_version 17509 (0.0024) [2024-03-29 14:04:58,839][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.3, 300 sec: 42265.2). Total num frames: 286916608. Throughput: 0: 42090.8. Samples: 169153940. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:04:58,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 14:05:00,941][00497] Updated weights for policy 0, policy_version 17519 (0.0025) [2024-03-29 14:05:03,839][00126] Fps is (10 sec: 42599.2, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 287129600. Throughput: 0: 41666.3. Samples: 169266260. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 14:05:03,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:05:04,221][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017527_287162368.pth... [2024-03-29 14:05:04,550][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000016908_277020672.pth [2024-03-29 14:05:05,281][00497] Updated weights for policy 0, policy_version 17529 (0.0023) [2024-03-29 14:05:08,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 287326208. Throughput: 0: 41898.2. Samples: 169537220. Policy #0 lag: (min: 0.0, avg: 21.8, max: 44.0) [2024-03-29 14:05:08,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:05:09,194][00497] Updated weights for policy 0, policy_version 17539 (0.0020) [2024-03-29 14:05:12,844][00497] Updated weights for policy 0, policy_version 17549 (0.0029) [2024-03-29 14:05:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 287555584. Throughput: 0: 41837.9. Samples: 169778700. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:05:13,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:05:16,587][00497] Updated weights for policy 0, policy_version 17559 (0.0023) [2024-03-29 14:05:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 287752192. Throughput: 0: 41904.5. Samples: 169905680. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:05:18,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:05:20,803][00497] Updated weights for policy 0, policy_version 17569 (0.0033) [2024-03-29 14:05:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 42098.6). Total num frames: 287948800. Throughput: 0: 41688.9. Samples: 170165640. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:05:23,841][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:05:24,741][00497] Updated weights for policy 0, policy_version 17579 (0.0023) [2024-03-29 14:05:25,524][00476] Signal inference workers to stop experience collection... (6050 times) [2024-03-29 14:05:25,525][00476] Signal inference workers to resume experience collection... (6050 times) [2024-03-29 14:05:25,568][00497] InferenceWorker_p0-w0: stopping experience collection (6050 times) [2024-03-29 14:05:25,568][00497] InferenceWorker_p0-w0: resuming experience collection (6050 times) [2024-03-29 14:05:28,497][00497] Updated weights for policy 0, policy_version 17589 (0.0023) [2024-03-29 14:05:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 288194560. Throughput: 0: 41806.6. Samples: 170416620. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 14:05:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:05:32,257][00497] Updated weights for policy 0, policy_version 17599 (0.0018) [2024-03-29 14:05:33,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 288391168. Throughput: 0: 42063.2. Samples: 170538840. Policy #0 lag: (min: 1.0, avg: 21.0, max: 43.0) [2024-03-29 14:05:33,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:05:35,966][00497] Updated weights for policy 0, policy_version 17609 (0.0028) [2024-03-29 14:05:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.3, 300 sec: 42154.1). Total num frames: 288587776. Throughput: 0: 41878.8. Samples: 170801000. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 14:05:38,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:05:40,034][00497] Updated weights for policy 0, policy_version 17619 (0.0023) [2024-03-29 14:05:43,719][00497] Updated weights for policy 0, policy_version 17629 (0.0024) [2024-03-29 14:05:43,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42265.1). Total num frames: 288833536. Throughput: 0: 42206.0. Samples: 171053220. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 14:05:43,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:05:47,586][00497] Updated weights for policy 0, policy_version 17639 (0.0034) [2024-03-29 14:05:48,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 289030144. Throughput: 0: 42484.8. Samples: 171178080. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 14:05:48,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:05:51,712][00497] Updated weights for policy 0, policy_version 17649 (0.0018) [2024-03-29 14:05:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 289226752. Throughput: 0: 42165.7. Samples: 171434680. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:05:53,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:05:55,643][00497] Updated weights for policy 0, policy_version 17659 (0.0019) [2024-03-29 14:05:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 289456128. Throughput: 0: 42256.0. Samples: 171680220. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:05:58,841][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:05:59,283][00497] Updated weights for policy 0, policy_version 17669 (0.0023) [2024-03-29 14:06:00,806][00476] Signal inference workers to stop experience collection... (6100 times) [2024-03-29 14:06:00,809][00476] Signal inference workers to resume experience collection... (6100 times) [2024-03-29 14:06:00,857][00497] InferenceWorker_p0-w0: stopping experience collection (6100 times) [2024-03-29 14:06:00,857][00497] InferenceWorker_p0-w0: resuming experience collection (6100 times) [2024-03-29 14:06:03,276][00497] Updated weights for policy 0, policy_version 17679 (0.0020) [2024-03-29 14:06:03,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 289669120. Throughput: 0: 42325.3. Samples: 171810320. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:06:03,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:06:07,354][00497] Updated weights for policy 0, policy_version 17689 (0.0018) [2024-03-29 14:06:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 289865728. Throughput: 0: 42202.7. Samples: 172064760. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:06:08,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:06:11,430][00497] Updated weights for policy 0, policy_version 17699 (0.0021) [2024-03-29 14:06:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 290078720. Throughput: 0: 41976.0. Samples: 172305540. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:06:13,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 14:06:15,111][00497] Updated weights for policy 0, policy_version 17709 (0.0025) [2024-03-29 14:06:18,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 290291712. Throughput: 0: 42218.3. Samples: 172438660. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 14:06:18,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 14:06:18,951][00497] Updated weights for policy 0, policy_version 17719 (0.0033) [2024-03-29 14:06:22,849][00497] Updated weights for policy 0, policy_version 17729 (0.0018) [2024-03-29 14:06:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 290504704. Throughput: 0: 41950.6. Samples: 172688780. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 14:06:23,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:06:27,092][00497] Updated weights for policy 0, policy_version 17739 (0.0020) [2024-03-29 14:06:28,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.1, 300 sec: 42154.1). Total num frames: 290701312. Throughput: 0: 41932.9. Samples: 172940200. Policy #0 lag: (min: 0.0, avg: 22.4, max: 43.0) [2024-03-29 14:06:28,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:06:30,715][00497] Updated weights for policy 0, policy_version 17749 (0.0030) [2024-03-29 14:06:32,539][00476] Signal inference workers to stop experience collection... (6150 times) [2024-03-29 14:06:32,546][00476] Signal inference workers to resume experience collection... (6150 times) [2024-03-29 14:06:32,586][00497] InferenceWorker_p0-w0: stopping experience collection (6150 times) [2024-03-29 14:06:32,586][00497] InferenceWorker_p0-w0: resuming experience collection (6150 times) [2024-03-29 14:06:33,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 290914304. Throughput: 0: 41776.8. Samples: 173058040. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:06:33,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:06:34,666][00497] Updated weights for policy 0, policy_version 17759 (0.0026) [2024-03-29 14:06:38,750][00497] Updated weights for policy 0, policy_version 17769 (0.0021) [2024-03-29 14:06:38,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 291127296. Throughput: 0: 41676.9. Samples: 173310140. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:06:38,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 14:06:42,833][00497] Updated weights for policy 0, policy_version 17779 (0.0024) [2024-03-29 14:06:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 42098.5). Total num frames: 291323904. Throughput: 0: 41745.7. Samples: 173558780. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 14:06:43,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:06:46,404][00497] Updated weights for policy 0, policy_version 17789 (0.0028) [2024-03-29 14:06:48,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 291536896. Throughput: 0: 41484.9. Samples: 173677140. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 14:06:48,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:06:50,478][00497] Updated weights for policy 0, policy_version 17799 (0.0020) [2024-03-29 14:06:53,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 42043.0). Total num frames: 291733504. Throughput: 0: 41561.4. Samples: 173935020. Policy #0 lag: (min: 1.0, avg: 18.1, max: 41.0) [2024-03-29 14:06:53,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:06:54,762][00497] Updated weights for policy 0, policy_version 17809 (0.0026) [2024-03-29 14:06:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 291930112. Throughput: 0: 41812.0. Samples: 174187080. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:06:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:06:58,969][00497] Updated weights for policy 0, policy_version 17819 (0.0028) [2024-03-29 14:07:01,970][00476] Signal inference workers to stop experience collection... (6200 times) [2024-03-29 14:07:02,009][00497] InferenceWorker_p0-w0: stopping experience collection (6200 times) [2024-03-29 14:07:02,167][00476] Signal inference workers to resume experience collection... (6200 times) [2024-03-29 14:07:02,168][00497] InferenceWorker_p0-w0: resuming experience collection (6200 times) [2024-03-29 14:07:02,470][00497] Updated weights for policy 0, policy_version 17829 (0.0026) [2024-03-29 14:07:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 292143104. Throughput: 0: 41377.3. Samples: 174300640. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:07:03,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:07:03,917][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017832_292159488.pth... [2024-03-29 14:07:04,221][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017218_282099712.pth [2024-03-29 14:07:06,691][00497] Updated weights for policy 0, policy_version 17839 (0.0025) [2024-03-29 14:07:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 292339712. Throughput: 0: 40887.5. Samples: 174528720. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 14:07:08,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:07:10,854][00497] Updated weights for policy 0, policy_version 17849 (0.0019) [2024-03-29 14:07:13,839][00126] Fps is (10 sec: 37682.8, 60 sec: 40686.8, 300 sec: 41820.8). Total num frames: 292519936. Throughput: 0: 41201.7. Samples: 174794280. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 14:07:13,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 14:07:15,074][00497] Updated weights for policy 0, policy_version 17859 (0.0027) [2024-03-29 14:07:18,509][00497] Updated weights for policy 0, policy_version 17869 (0.0029) [2024-03-29 14:07:18,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41233.0, 300 sec: 41987.5). Total num frames: 292765696. Throughput: 0: 41315.2. Samples: 174917220. Policy #0 lag: (min: 1.0, avg: 22.9, max: 42.0) [2024-03-29 14:07:18,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 14:07:22,345][00497] Updated weights for policy 0, policy_version 17879 (0.0024) [2024-03-29 14:07:23,839][00126] Fps is (10 sec: 44237.5, 60 sec: 40960.0, 300 sec: 41876.4). Total num frames: 292962304. Throughput: 0: 41248.1. Samples: 175166300. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:07:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:07:26,355][00497] Updated weights for policy 0, policy_version 17889 (0.0018) [2024-03-29 14:07:28,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 293158912. Throughput: 0: 41299.6. Samples: 175417260. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:07:28,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:07:30,632][00497] Updated weights for policy 0, policy_version 17899 (0.0025) [2024-03-29 14:07:32,810][00476] Signal inference workers to stop experience collection... (6250 times) [2024-03-29 14:07:32,811][00476] Signal inference workers to resume experience collection... (6250 times) [2024-03-29 14:07:32,861][00497] InferenceWorker_p0-w0: stopping experience collection (6250 times) [2024-03-29 14:07:32,861][00497] InferenceWorker_p0-w0: resuming experience collection (6250 times) [2024-03-29 14:07:33,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 293404672. Throughput: 0: 41431.5. Samples: 175541560. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:07:33,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 14:07:33,983][00497] Updated weights for policy 0, policy_version 17909 (0.0022) [2024-03-29 14:07:37,977][00497] Updated weights for policy 0, policy_version 17919 (0.0018) [2024-03-29 14:07:38,839][00126] Fps is (10 sec: 45875.2, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 293617664. Throughput: 0: 41300.0. Samples: 175793520. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:07:38,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 14:07:41,876][00497] Updated weights for policy 0, policy_version 17929 (0.0030) [2024-03-29 14:07:43,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 293814272. Throughput: 0: 41377.7. Samples: 176049080. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:07:43,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:07:46,254][00497] Updated weights for policy 0, policy_version 17939 (0.0020) [2024-03-29 14:07:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 294027264. Throughput: 0: 41598.2. Samples: 176172560. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 14:07:48,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:07:49,558][00497] Updated weights for policy 0, policy_version 17949 (0.0028) [2024-03-29 14:07:53,721][00497] Updated weights for policy 0, policy_version 17959 (0.0021) [2024-03-29 14:07:53,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 294240256. Throughput: 0: 41954.7. Samples: 176416680. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 14:07:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:07:57,756][00497] Updated weights for policy 0, policy_version 17969 (0.0019) [2024-03-29 14:07:58,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 294420480. Throughput: 0: 41516.0. Samples: 176662500. Policy #0 lag: (min: 2.0, avg: 19.1, max: 41.0) [2024-03-29 14:07:58,841][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 14:08:02,079][00497] Updated weights for policy 0, policy_version 17979 (0.0021) [2024-03-29 14:08:03,226][00476] Signal inference workers to stop experience collection... (6300 times) [2024-03-29 14:08:03,245][00497] InferenceWorker_p0-w0: stopping experience collection (6300 times) [2024-03-29 14:08:03,432][00476] Signal inference workers to resume experience collection... (6300 times) [2024-03-29 14:08:03,433][00497] InferenceWorker_p0-w0: resuming experience collection (6300 times) [2024-03-29 14:08:03,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 294633472. Throughput: 0: 41917.2. Samples: 176803500. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 14:08:03,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 14:08:05,483][00497] Updated weights for policy 0, policy_version 17989 (0.0035) [2024-03-29 14:08:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 294846464. Throughput: 0: 41521.3. Samples: 177034760. Policy #0 lag: (min: 0.0, avg: 21.8, max: 43.0) [2024-03-29 14:08:08,841][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 14:08:09,701][00497] Updated weights for policy 0, policy_version 17999 (0.0025) [2024-03-29 14:08:13,388][00497] Updated weights for policy 0, policy_version 18009 (0.0021) [2024-03-29 14:08:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42598.4, 300 sec: 41931.9). Total num frames: 295075840. Throughput: 0: 41629.7. Samples: 177290600. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 14:08:13,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:08:17,989][00497] Updated weights for policy 0, policy_version 18019 (0.0023) [2024-03-29 14:08:18,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 295239680. Throughput: 0: 41887.0. Samples: 177426480. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 14:08:18,840][00126] Avg episode reward: [(0, '0.298')] [2024-03-29 14:08:21,412][00497] Updated weights for policy 0, policy_version 18029 (0.0027) [2024-03-29 14:08:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 295469056. Throughput: 0: 41262.6. Samples: 177650340. Policy #0 lag: (min: 1.0, avg: 21.3, max: 43.0) [2024-03-29 14:08:23,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 14:08:25,542][00497] Updated weights for policy 0, policy_version 18039 (0.0024) [2024-03-29 14:08:28,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 295682048. Throughput: 0: 41442.8. Samples: 177914000. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:08:28,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:08:29,232][00497] Updated weights for policy 0, policy_version 18049 (0.0026) [2024-03-29 14:08:32,862][00476] Signal inference workers to stop experience collection... (6350 times) [2024-03-29 14:08:32,905][00497] InferenceWorker_p0-w0: stopping experience collection (6350 times) [2024-03-29 14:08:33,077][00476] Signal inference workers to resume experience collection... (6350 times) [2024-03-29 14:08:33,078][00497] InferenceWorker_p0-w0: resuming experience collection (6350 times) [2024-03-29 14:08:33,794][00497] Updated weights for policy 0, policy_version 18059 (0.0018) [2024-03-29 14:08:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 295878656. Throughput: 0: 41491.5. Samples: 178039680. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:08:33,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:08:37,315][00497] Updated weights for policy 0, policy_version 18069 (0.0030) [2024-03-29 14:08:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 296108032. Throughput: 0: 41420.0. Samples: 178280580. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:08:38,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:08:41,216][00497] Updated weights for policy 0, policy_version 18079 (0.0024) [2024-03-29 14:08:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 296304640. Throughput: 0: 41559.6. Samples: 178532680. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:08:43,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 14:08:45,021][00497] Updated weights for policy 0, policy_version 18089 (0.0038) [2024-03-29 14:08:48,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.2, 300 sec: 41765.3). Total num frames: 296501248. Throughput: 0: 41238.0. Samples: 178659200. Policy #0 lag: (min: 0.0, avg: 22.6, max: 43.0) [2024-03-29 14:08:48,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:08:49,308][00497] Updated weights for policy 0, policy_version 18099 (0.0025) [2024-03-29 14:08:52,745][00497] Updated weights for policy 0, policy_version 18109 (0.0032) [2024-03-29 14:08:53,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 296730624. Throughput: 0: 41591.8. Samples: 178906400. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:08:53,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:08:56,732][00497] Updated weights for policy 0, policy_version 18119 (0.0022) [2024-03-29 14:08:58,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 296960000. Throughput: 0: 41763.7. Samples: 179169960. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:08:58,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 14:09:00,422][00497] Updated weights for policy 0, policy_version 18129 (0.0032) [2024-03-29 14:09:03,839][00126] Fps is (10 sec: 39322.3, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 297123840. Throughput: 0: 41443.2. Samples: 179291420. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:09:03,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:09:04,177][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018137_297156608.pth... [2024-03-29 14:09:04,506][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017527_287162368.pth [2024-03-29 14:09:05,098][00497] Updated weights for policy 0, policy_version 18139 (0.0028) [2024-03-29 14:09:06,965][00476] Signal inference workers to stop experience collection... (6400 times) [2024-03-29 14:09:07,045][00497] InferenceWorker_p0-w0: stopping experience collection (6400 times) [2024-03-29 14:09:07,134][00476] Signal inference workers to resume experience collection... (6400 times) [2024-03-29 14:09:07,134][00497] InferenceWorker_p0-w0: resuming experience collection (6400 times) [2024-03-29 14:09:08,652][00497] Updated weights for policy 0, policy_version 18149 (0.0024) [2024-03-29 14:09:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 297353216. Throughput: 0: 41971.6. Samples: 179539060. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 14:09:08,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 14:09:12,689][00497] Updated weights for policy 0, policy_version 18159 (0.0017) [2024-03-29 14:09:13,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 297566208. Throughput: 0: 41629.3. Samples: 179787320. Policy #0 lag: (min: 1.0, avg: 19.0, max: 42.0) [2024-03-29 14:09:13,841][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:09:16,208][00497] Updated weights for policy 0, policy_version 18169 (0.0027) [2024-03-29 14:09:18,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 297746432. Throughput: 0: 41574.3. Samples: 179910520. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:09:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:09:20,915][00497] Updated weights for policy 0, policy_version 18179 (0.0029) [2024-03-29 14:09:23,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 297975808. Throughput: 0: 42178.6. Samples: 180178620. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:09:23,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:09:24,445][00497] Updated weights for policy 0, policy_version 18189 (0.0031) [2024-03-29 14:09:28,275][00497] Updated weights for policy 0, policy_version 18199 (0.0034) [2024-03-29 14:09:28,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 298188800. Throughput: 0: 41924.1. Samples: 180419260. Policy #0 lag: (min: 0.0, avg: 21.2, max: 41.0) [2024-03-29 14:09:28,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:09:31,968][00497] Updated weights for policy 0, policy_version 18209 (0.0028) [2024-03-29 14:09:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 298385408. Throughput: 0: 41803.9. Samples: 180540380. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:09:33,841][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:09:36,649][00497] Updated weights for policy 0, policy_version 18219 (0.0022) [2024-03-29 14:09:38,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 298598400. Throughput: 0: 42087.3. Samples: 180800320. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:09:38,840][00126] Avg episode reward: [(0, '0.442')] [2024-03-29 14:09:40,216][00497] Updated weights for policy 0, policy_version 18229 (0.0035) [2024-03-29 14:09:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 298795008. Throughput: 0: 41255.9. Samples: 181026480. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 14:09:43,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:09:43,904][00476] Signal inference workers to stop experience collection... (6450 times) [2024-03-29 14:09:43,904][00476] Signal inference workers to resume experience collection... (6450 times) [2024-03-29 14:09:43,943][00497] InferenceWorker_p0-w0: stopping experience collection (6450 times) [2024-03-29 14:09:43,943][00497] InferenceWorker_p0-w0: resuming experience collection (6450 times) [2024-03-29 14:09:44,169][00497] Updated weights for policy 0, policy_version 18239 (0.0021) [2024-03-29 14:09:47,946][00497] Updated weights for policy 0, policy_version 18249 (0.0028) [2024-03-29 14:09:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 299008000. Throughput: 0: 41487.1. Samples: 181158340. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 14:09:48,841][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 14:09:52,544][00497] Updated weights for policy 0, policy_version 18259 (0.0027) [2024-03-29 14:09:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 299204608. Throughput: 0: 41726.6. Samples: 181416760. Policy #0 lag: (min: 0.0, avg: 21.2, max: 42.0) [2024-03-29 14:09:53,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:09:55,950][00497] Updated weights for policy 0, policy_version 18269 (0.0024) [2024-03-29 14:09:58,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 299433984. Throughput: 0: 41618.7. Samples: 181660160. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 14:09:58,840][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 14:09:59,669][00497] Updated weights for policy 0, policy_version 18279 (0.0018) [2024-03-29 14:10:03,433][00497] Updated weights for policy 0, policy_version 18289 (0.0021) [2024-03-29 14:10:03,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 299663360. Throughput: 0: 41875.6. Samples: 181794920. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 14:10:03,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:10:07,778][00497] Updated weights for policy 0, policy_version 18299 (0.0022) [2024-03-29 14:10:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 299843584. Throughput: 0: 41849.4. Samples: 182061840. Policy #0 lag: (min: 1.0, avg: 21.9, max: 43.0) [2024-03-29 14:10:08,841][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:10:11,293][00497] Updated weights for policy 0, policy_version 18309 (0.0017) [2024-03-29 14:10:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 300089344. Throughput: 0: 41684.8. Samples: 182295080. Policy #0 lag: (min: 2.0, avg: 19.9, max: 43.0) [2024-03-29 14:10:13,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:10:14,738][00476] Signal inference workers to stop experience collection... (6500 times) [2024-03-29 14:10:14,774][00497] InferenceWorker_p0-w0: stopping experience collection (6500 times) [2024-03-29 14:10:14,925][00476] Signal inference workers to resume experience collection... (6500 times) [2024-03-29 14:10:14,926][00497] InferenceWorker_p0-w0: resuming experience collection (6500 times) [2024-03-29 14:10:15,182][00497] Updated weights for policy 0, policy_version 18319 (0.0029) [2024-03-29 14:10:18,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 300285952. Throughput: 0: 41851.1. Samples: 182423680. Policy #0 lag: (min: 2.0, avg: 19.9, max: 43.0) [2024-03-29 14:10:18,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:10:18,928][00497] Updated weights for policy 0, policy_version 18329 (0.0024) [2024-03-29 14:10:23,393][00497] Updated weights for policy 0, policy_version 18339 (0.0024) [2024-03-29 14:10:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 300482560. Throughput: 0: 42107.9. Samples: 182695180. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 14:10:23,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:10:26,940][00497] Updated weights for policy 0, policy_version 18349 (0.0019) [2024-03-29 14:10:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 300711936. Throughput: 0: 42110.7. Samples: 182921460. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 14:10:28,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:10:30,806][00497] Updated weights for policy 0, policy_version 18359 (0.0025) [2024-03-29 14:10:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 300908544. Throughput: 0: 42177.8. Samples: 183056340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 42.0) [2024-03-29 14:10:33,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:10:34,461][00497] Updated weights for policy 0, policy_version 18369 (0.0023) [2024-03-29 14:10:38,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.0, 300 sec: 41543.2). Total num frames: 301088768. Throughput: 0: 42201.8. Samples: 183315840. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:10:38,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 14:10:39,245][00497] Updated weights for policy 0, policy_version 18379 (0.0018) [2024-03-29 14:10:42,557][00497] Updated weights for policy 0, policy_version 18389 (0.0027) [2024-03-29 14:10:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42598.5, 300 sec: 41765.3). Total num frames: 301350912. Throughput: 0: 42122.6. Samples: 183555680. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:10:43,840][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 14:10:46,323][00497] Updated weights for policy 0, policy_version 18399 (0.0023) [2024-03-29 14:10:48,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 301547520. Throughput: 0: 42105.3. Samples: 183689660. Policy #0 lag: (min: 0.0, avg: 21.7, max: 42.0) [2024-03-29 14:10:48,840][00126] Avg episode reward: [(0, '0.499')] [2024-03-29 14:10:49,351][00476] Signal inference workers to stop experience collection... (6550 times) [2024-03-29 14:10:49,351][00476] Signal inference workers to resume experience collection... (6550 times) [2024-03-29 14:10:49,397][00497] InferenceWorker_p0-w0: stopping experience collection (6550 times) [2024-03-29 14:10:49,397][00497] InferenceWorker_p0-w0: resuming experience collection (6550 times) [2024-03-29 14:10:50,092][00497] Updated weights for policy 0, policy_version 18409 (0.0020) [2024-03-29 14:10:53,839][00126] Fps is (10 sec: 37682.9, 60 sec: 42052.3, 300 sec: 41598.7). Total num frames: 301727744. Throughput: 0: 41868.5. Samples: 183945920. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:10:53,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:10:54,726][00497] Updated weights for policy 0, policy_version 18419 (0.0018) [2024-03-29 14:10:57,862][00497] Updated weights for policy 0, policy_version 18429 (0.0024) [2024-03-29 14:10:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 301989888. Throughput: 0: 42274.7. Samples: 184197440. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:10:58,840][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 14:11:01,716][00497] Updated weights for policy 0, policy_version 18439 (0.0022) [2024-03-29 14:11:03,839][00126] Fps is (10 sec: 47513.3, 60 sec: 42325.2, 300 sec: 41820.8). Total num frames: 302202880. Throughput: 0: 42358.6. Samples: 184329820. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 14:11:03,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:11:04,000][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018446_302219264.pth... [2024-03-29 14:11:04,293][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000017832_292159488.pth [2024-03-29 14:11:05,490][00497] Updated weights for policy 0, policy_version 18449 (0.0017) [2024-03-29 14:11:08,839][00126] Fps is (10 sec: 37683.0, 60 sec: 42052.3, 300 sec: 41654.2). Total num frames: 302366720. Throughput: 0: 42159.1. Samples: 184592340. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 14:11:08,841][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:11:09,922][00497] Updated weights for policy 0, policy_version 18459 (0.0018) [2024-03-29 14:11:13,301][00497] Updated weights for policy 0, policy_version 18469 (0.0025) [2024-03-29 14:11:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 302612480. Throughput: 0: 42606.3. Samples: 184838740. Policy #0 lag: (min: 1.0, avg: 21.0, max: 42.0) [2024-03-29 14:11:13,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 14:11:17,205][00497] Updated weights for policy 0, policy_version 18479 (0.0018) [2024-03-29 14:11:18,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42598.4, 300 sec: 41820.8). Total num frames: 302841856. Throughput: 0: 42443.0. Samples: 184966280. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 14:11:18,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:11:20,802][00497] Updated weights for policy 0, policy_version 18489 (0.0019) [2024-03-29 14:11:23,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 303005696. Throughput: 0: 42529.1. Samples: 185229640. Policy #0 lag: (min: 0.0, avg: 22.0, max: 41.0) [2024-03-29 14:11:23,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:11:25,409][00497] Updated weights for policy 0, policy_version 18499 (0.0026) [2024-03-29 14:11:25,793][00476] Signal inference workers to stop experience collection... (6600 times) [2024-03-29 14:11:25,816][00497] InferenceWorker_p0-w0: stopping experience collection (6600 times) [2024-03-29 14:11:25,988][00476] Signal inference workers to resume experience collection... (6600 times) [2024-03-29 14:11:25,988][00497] InferenceWorker_p0-w0: resuming experience collection (6600 times) [2024-03-29 14:11:28,791][00497] Updated weights for policy 0, policy_version 18509 (0.0019) [2024-03-29 14:11:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 303251456. Throughput: 0: 42764.4. Samples: 185480080. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:11:28,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:11:32,629][00497] Updated weights for policy 0, policy_version 18519 (0.0029) [2024-03-29 14:11:33,839][00126] Fps is (10 sec: 45874.4, 60 sec: 42598.3, 300 sec: 41820.9). Total num frames: 303464448. Throughput: 0: 42410.6. Samples: 185598140. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:11:33,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:11:36,389][00497] Updated weights for policy 0, policy_version 18529 (0.0032) [2024-03-29 14:11:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42871.5, 300 sec: 41820.8). Total num frames: 303661056. Throughput: 0: 42356.9. Samples: 185851980. Policy #0 lag: (min: 1.0, avg: 20.7, max: 41.0) [2024-03-29 14:11:38,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:11:41,069][00497] Updated weights for policy 0, policy_version 18539 (0.0027) [2024-03-29 14:11:43,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 303874048. Throughput: 0: 42606.1. Samples: 186114720. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:11:43,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:11:44,411][00497] Updated weights for policy 0, policy_version 18549 (0.0021) [2024-03-29 14:11:48,284][00497] Updated weights for policy 0, policy_version 18559 (0.0019) [2024-03-29 14:11:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 304087040. Throughput: 0: 42069.9. Samples: 186222960. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:11:48,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:11:52,122][00497] Updated weights for policy 0, policy_version 18569 (0.0024) [2024-03-29 14:11:53,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42871.5, 300 sec: 41931.9). Total num frames: 304300032. Throughput: 0: 42099.6. Samples: 186486820. Policy #0 lag: (min: 0.0, avg: 20.0, max: 43.0) [2024-03-29 14:11:53,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:11:56,510][00497] Updated weights for policy 0, policy_version 18579 (0.0026) [2024-03-29 14:11:58,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 304513024. Throughput: 0: 42592.3. Samples: 186755400. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 14:11:58,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 14:11:59,740][00497] Updated weights for policy 0, policy_version 18589 (0.0026) [2024-03-29 14:12:02,191][00476] Signal inference workers to stop experience collection... (6650 times) [2024-03-29 14:12:02,238][00497] InferenceWorker_p0-w0: stopping experience collection (6650 times) [2024-03-29 14:12:02,415][00476] Signal inference workers to resume experience collection... (6650 times) [2024-03-29 14:12:02,415][00497] InferenceWorker_p0-w0: resuming experience collection (6650 times) [2024-03-29 14:12:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 304709632. Throughput: 0: 42038.3. Samples: 186858000. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 14:12:03,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:12:03,905][00497] Updated weights for policy 0, policy_version 18599 (0.0022) [2024-03-29 14:12:07,738][00497] Updated weights for policy 0, policy_version 18609 (0.0025) [2024-03-29 14:12:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 304922624. Throughput: 0: 41940.3. Samples: 187116960. Policy #0 lag: (min: 0.0, avg: 18.5, max: 41.0) [2024-03-29 14:12:08,841][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:12:12,032][00497] Updated weights for policy 0, policy_version 18619 (0.0024) [2024-03-29 14:12:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 305135616. Throughput: 0: 42281.4. Samples: 187382740. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:12:13,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:12:15,267][00497] Updated weights for policy 0, policy_version 18629 (0.0034) [2024-03-29 14:12:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 305348608. Throughput: 0: 42064.1. Samples: 187491020. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:12:18,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:12:19,398][00497] Updated weights for policy 0, policy_version 18639 (0.0019) [2024-03-29 14:12:23,267][00497] Updated weights for policy 0, policy_version 18649 (0.0030) [2024-03-29 14:12:23,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 305561600. Throughput: 0: 42004.5. Samples: 187742180. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 14:12:23,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 14:12:27,645][00497] Updated weights for policy 0, policy_version 18659 (0.0018) [2024-03-29 14:12:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 305758208. Throughput: 0: 42287.2. Samples: 188017640. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 14:12:28,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:12:30,852][00497] Updated weights for policy 0, policy_version 18669 (0.0035) [2024-03-29 14:12:33,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 306003968. Throughput: 0: 42356.9. Samples: 188129020. Policy #0 lag: (min: 0.0, avg: 20.5, max: 40.0) [2024-03-29 14:12:33,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:12:34,809][00497] Updated weights for policy 0, policy_version 18679 (0.0025) [2024-03-29 14:12:38,750][00497] Updated weights for policy 0, policy_version 18689 (0.0027) [2024-03-29 14:12:38,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 306200576. Throughput: 0: 41899.6. Samples: 188372300. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:12:38,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 14:12:41,888][00476] Signal inference workers to stop experience collection... (6700 times) [2024-03-29 14:12:41,931][00497] InferenceWorker_p0-w0: stopping experience collection (6700 times) [2024-03-29 14:12:42,083][00476] Signal inference workers to resume experience collection... (6700 times) [2024-03-29 14:12:42,084][00497] InferenceWorker_p0-w0: resuming experience collection (6700 times) [2024-03-29 14:12:43,360][00497] Updated weights for policy 0, policy_version 18699 (0.0023) [2024-03-29 14:12:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 306380800. Throughput: 0: 41981.4. Samples: 188644560. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:12:43,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:12:46,585][00497] Updated weights for policy 0, policy_version 18709 (0.0027) [2024-03-29 14:12:48,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 306626560. Throughput: 0: 42041.7. Samples: 188749880. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:12:48,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:12:50,772][00497] Updated weights for policy 0, policy_version 18719 (0.0022) [2024-03-29 14:12:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 306823168. Throughput: 0: 41809.3. Samples: 188998380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 14:12:53,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:12:54,529][00497] Updated weights for policy 0, policy_version 18729 (0.0022) [2024-03-29 14:12:58,703][00497] Updated weights for policy 0, policy_version 18739 (0.0018) [2024-03-29 14:12:58,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 307019776. Throughput: 0: 42108.0. Samples: 189277600. Policy #0 lag: (min: 1.0, avg: 21.5, max: 41.0) [2024-03-29 14:12:58,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:13:01,823][00497] Updated weights for policy 0, policy_version 18749 (0.0027) [2024-03-29 14:13:03,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 307265536. Throughput: 0: 42450.2. Samples: 189401280. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 14:13:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:13:04,138][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018755_307281920.pth... [2024-03-29 14:13:04,483][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018137_297156608.pth [2024-03-29 14:13:06,243][00497] Updated weights for policy 0, policy_version 18759 (0.0021) [2024-03-29 14:13:08,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 307478528. Throughput: 0: 42075.6. Samples: 189635580. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 14:13:08,841][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:13:10,366][00497] Updated weights for policy 0, policy_version 18769 (0.0031) [2024-03-29 14:13:13,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 307642368. Throughput: 0: 41824.5. Samples: 189899740. Policy #0 lag: (min: 1.0, avg: 22.3, max: 41.0) [2024-03-29 14:13:13,840][00126] Avg episode reward: [(0, '0.368')] [2024-03-29 14:13:14,384][00497] Updated weights for policy 0, policy_version 18779 (0.0022) [2024-03-29 14:13:14,760][00476] Signal inference workers to stop experience collection... (6750 times) [2024-03-29 14:13:14,839][00497] InferenceWorker_p0-w0: stopping experience collection (6750 times) [2024-03-29 14:13:14,840][00476] Signal inference workers to resume experience collection... (6750 times) [2024-03-29 14:13:14,863][00497] InferenceWorker_p0-w0: resuming experience collection (6750 times) [2024-03-29 14:13:17,603][00497] Updated weights for policy 0, policy_version 18789 (0.0026) [2024-03-29 14:13:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 307888128. Throughput: 0: 42215.9. Samples: 190028740. Policy #0 lag: (min: 0.0, avg: 19.4, max: 43.0) [2024-03-29 14:13:18,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:13:21,761][00497] Updated weights for policy 0, policy_version 18799 (0.0029) [2024-03-29 14:13:23,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 308101120. Throughput: 0: 42359.5. Samples: 190278480. Policy #0 lag: (min: 0.0, avg: 19.4, max: 43.0) [2024-03-29 14:13:23,840][00126] Avg episode reward: [(0, '0.341')] [2024-03-29 14:13:26,027][00497] Updated weights for policy 0, policy_version 18809 (0.0018) [2024-03-29 14:13:28,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 308264960. Throughput: 0: 41806.3. Samples: 190525840. Policy #0 lag: (min: 0.0, avg: 19.4, max: 43.0) [2024-03-29 14:13:28,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 14:13:29,905][00497] Updated weights for policy 0, policy_version 18819 (0.0022) [2024-03-29 14:13:33,356][00497] Updated weights for policy 0, policy_version 18829 (0.0019) [2024-03-29 14:13:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 308510720. Throughput: 0: 42291.2. Samples: 190652980. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 14:13:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:13:37,261][00497] Updated weights for policy 0, policy_version 18839 (0.0022) [2024-03-29 14:13:38,839][00126] Fps is (10 sec: 45874.4, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 308723712. Throughput: 0: 42292.9. Samples: 190901560. Policy #0 lag: (min: 1.0, avg: 19.3, max: 42.0) [2024-03-29 14:13:38,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:13:41,504][00497] Updated weights for policy 0, policy_version 18849 (0.0020) [2024-03-29 14:13:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 308903936. Throughput: 0: 41776.9. Samples: 191157560. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 14:13:43,840][00126] Avg episode reward: [(0, '0.438')] [2024-03-29 14:13:45,666][00497] Updated weights for policy 0, policy_version 18859 (0.0022) [2024-03-29 14:13:48,311][00476] Signal inference workers to stop experience collection... (6800 times) [2024-03-29 14:13:48,340][00497] InferenceWorker_p0-w0: stopping experience collection (6800 times) [2024-03-29 14:13:48,495][00476] Signal inference workers to resume experience collection... (6800 times) [2024-03-29 14:13:48,496][00497] InferenceWorker_p0-w0: resuming experience collection (6800 times) [2024-03-29 14:13:48,801][00497] Updated weights for policy 0, policy_version 18869 (0.0024) [2024-03-29 14:13:48,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 309149696. Throughput: 0: 41992.5. Samples: 191290940. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 14:13:48,840][00126] Avg episode reward: [(0, '0.288')] [2024-03-29 14:13:52,849][00497] Updated weights for policy 0, policy_version 18879 (0.0029) [2024-03-29 14:13:53,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 309346304. Throughput: 0: 42016.8. Samples: 191526340. Policy #0 lag: (min: 0.0, avg: 19.5, max: 40.0) [2024-03-29 14:13:53,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:13:57,032][00497] Updated weights for policy 0, policy_version 18889 (0.0022) [2024-03-29 14:13:58,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 309542912. Throughput: 0: 41970.2. Samples: 191788400. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 14:13:58,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 14:14:01,067][00497] Updated weights for policy 0, policy_version 18899 (0.0017) [2024-03-29 14:14:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 309772288. Throughput: 0: 42194.1. Samples: 191927480. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 14:14:03,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:14:04,206][00497] Updated weights for policy 0, policy_version 18909 (0.0025) [2024-03-29 14:14:08,371][00497] Updated weights for policy 0, policy_version 18919 (0.0023) [2024-03-29 14:14:08,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 309985280. Throughput: 0: 41948.9. Samples: 192166180. Policy #0 lag: (min: 0.0, avg: 20.2, max: 40.0) [2024-03-29 14:14:08,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 14:14:12,555][00497] Updated weights for policy 0, policy_version 18929 (0.0027) [2024-03-29 14:14:13,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 310181888. Throughput: 0: 42201.2. Samples: 192424900. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:14:13,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 14:14:16,471][00497] Updated weights for policy 0, policy_version 18939 (0.0021) [2024-03-29 14:14:18,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.3, 300 sec: 42098.6). Total num frames: 310394880. Throughput: 0: 42394.7. Samples: 192560740. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:14:18,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 14:14:19,938][00497] Updated weights for policy 0, policy_version 18949 (0.0033) [2024-03-29 14:14:23,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 310624256. Throughput: 0: 42055.3. Samples: 192794040. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:14:23,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:14:23,844][00497] Updated weights for policy 0, policy_version 18959 (0.0018) [2024-03-29 14:14:25,081][00476] Signal inference workers to stop experience collection... (6850 times) [2024-03-29 14:14:25,122][00497] InferenceWorker_p0-w0: stopping experience collection (6850 times) [2024-03-29 14:14:25,304][00476] Signal inference workers to resume experience collection... (6850 times) [2024-03-29 14:14:25,305][00497] InferenceWorker_p0-w0: resuming experience collection (6850 times) [2024-03-29 14:14:27,932][00497] Updated weights for policy 0, policy_version 18969 (0.0022) [2024-03-29 14:14:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 310804480. Throughput: 0: 42276.9. Samples: 193060020. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:14:28,840][00126] Avg episode reward: [(0, '0.481')] [2024-03-29 14:14:31,972][00497] Updated weights for policy 0, policy_version 18979 (0.0023) [2024-03-29 14:14:33,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 311033856. Throughput: 0: 42367.5. Samples: 193197480. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:14:33,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:14:35,330][00497] Updated weights for policy 0, policy_version 18989 (0.0022) [2024-03-29 14:14:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 311246848. Throughput: 0: 42258.3. Samples: 193427960. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 14:14:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:14:39,226][00497] Updated weights for policy 0, policy_version 18999 (0.0023) [2024-03-29 14:14:42,998][00497] Updated weights for policy 0, policy_version 19009 (0.0026) [2024-03-29 14:14:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 311459840. Throughput: 0: 42467.0. Samples: 193699420. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 14:14:43,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:14:47,511][00497] Updated weights for policy 0, policy_version 19019 (0.0018) [2024-03-29 14:14:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42265.2). Total num frames: 311672832. Throughput: 0: 42292.1. Samples: 193830620. Policy #0 lag: (min: 1.0, avg: 22.6, max: 41.0) [2024-03-29 14:14:48,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 14:14:50,633][00497] Updated weights for policy 0, policy_version 19029 (0.0024) [2024-03-29 14:14:53,841][00126] Fps is (10 sec: 42593.0, 60 sec: 42324.4, 300 sec: 42209.4). Total num frames: 311885824. Throughput: 0: 42396.5. Samples: 194074080. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 14:14:53,842][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 14:14:54,488][00497] Updated weights for policy 0, policy_version 19039 (0.0024) [2024-03-29 14:14:58,696][00497] Updated weights for policy 0, policy_version 19049 (0.0023) [2024-03-29 14:14:58,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42154.1). Total num frames: 312098816. Throughput: 0: 42297.4. Samples: 194328280. Policy #0 lag: (min: 1.0, avg: 21.5, max: 43.0) [2024-03-29 14:14:58,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:14:59,928][00476] Signal inference workers to stop experience collection... (6900 times) [2024-03-29 14:14:59,982][00497] InferenceWorker_p0-w0: stopping experience collection (6900 times) [2024-03-29 14:15:00,116][00476] Signal inference workers to resume experience collection... (6900 times) [2024-03-29 14:15:00,117][00497] InferenceWorker_p0-w0: resuming experience collection (6900 times) [2024-03-29 14:15:03,134][00497] Updated weights for policy 0, policy_version 19059 (0.0019) [2024-03-29 14:15:03,839][00126] Fps is (10 sec: 40965.7, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 312295424. Throughput: 0: 42351.5. Samples: 194466560. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 14:15:03,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:15:03,969][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019062_312311808.pth... [2024-03-29 14:15:04,361][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018446_302219264.pth [2024-03-29 14:15:06,390][00497] Updated weights for policy 0, policy_version 19069 (0.0024) [2024-03-29 14:15:08,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 312524800. Throughput: 0: 41940.0. Samples: 194681340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 14:15:08,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:15:10,563][00497] Updated weights for policy 0, policy_version 19079 (0.0023) [2024-03-29 14:15:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 312721408. Throughput: 0: 41555.5. Samples: 194930020. Policy #0 lag: (min: 1.0, avg: 19.1, max: 41.0) [2024-03-29 14:15:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:15:14,842][00497] Updated weights for policy 0, policy_version 19089 (0.0023) [2024-03-29 14:15:18,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 42098.6). Total num frames: 312901632. Throughput: 0: 41781.8. Samples: 195077660. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 14:15:18,842][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:15:19,130][00497] Updated weights for policy 0, policy_version 19099 (0.0018) [2024-03-29 14:15:22,068][00497] Updated weights for policy 0, policy_version 19109 (0.0020) [2024-03-29 14:15:23,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42325.2, 300 sec: 42209.6). Total num frames: 313163776. Throughput: 0: 42093.7. Samples: 195322180. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 14:15:23,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 14:15:26,304][00497] Updated weights for policy 0, policy_version 19119 (0.0029) [2024-03-29 14:15:28,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 313360384. Throughput: 0: 41234.7. Samples: 195554980. Policy #0 lag: (min: 0.0, avg: 20.1, max: 43.0) [2024-03-29 14:15:28,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:15:30,544][00497] Updated weights for policy 0, policy_version 19129 (0.0020) [2024-03-29 14:15:33,679][00476] Signal inference workers to stop experience collection... (6950 times) [2024-03-29 14:15:33,715][00497] InferenceWorker_p0-w0: stopping experience collection (6950 times) [2024-03-29 14:15:33,839][00126] Fps is (10 sec: 36045.3, 60 sec: 41506.2, 300 sec: 42154.1). Total num frames: 313524224. Throughput: 0: 41453.8. Samples: 195696040. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:15:33,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:15:33,871][00476] Signal inference workers to resume experience collection... (6950 times) [2024-03-29 14:15:33,872][00497] InferenceWorker_p0-w0: resuming experience collection (6950 times) [2024-03-29 14:15:34,806][00497] Updated weights for policy 0, policy_version 19139 (0.0023) [2024-03-29 14:15:37,957][00497] Updated weights for policy 0, policy_version 19149 (0.0032) [2024-03-29 14:15:38,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 313769984. Throughput: 0: 41541.6. Samples: 195943400. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:15:38,840][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 14:15:41,929][00497] Updated weights for policy 0, policy_version 19159 (0.0023) [2024-03-29 14:15:43,839][00126] Fps is (10 sec: 47513.4, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 313999360. Throughput: 0: 41598.3. Samples: 196200200. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:15:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:15:46,131][00497] Updated weights for policy 0, policy_version 19169 (0.0026) [2024-03-29 14:15:48,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.1, 300 sec: 42098.6). Total num frames: 314146816. Throughput: 0: 41379.1. Samples: 196328620. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 14:15:48,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 14:15:50,522][00497] Updated weights for policy 0, policy_version 19179 (0.0029) [2024-03-29 14:15:53,565][00497] Updated weights for policy 0, policy_version 19189 (0.0025) [2024-03-29 14:15:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41780.1, 300 sec: 42043.0). Total num frames: 314392576. Throughput: 0: 42102.7. Samples: 196575960. Policy #0 lag: (min: 2.0, avg: 23.3, max: 43.0) [2024-03-29 14:15:53,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:15:57,651][00497] Updated weights for policy 0, policy_version 19199 (0.0028) [2024-03-29 14:15:58,839][00126] Fps is (10 sec: 45874.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 314605568. Throughput: 0: 42037.3. Samples: 196821700. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:15:58,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:16:01,981][00497] Updated weights for policy 0, policy_version 19209 (0.0024) [2024-03-29 14:16:03,159][00476] Signal inference workers to stop experience collection... (7000 times) [2024-03-29 14:16:03,160][00476] Signal inference workers to resume experience collection... (7000 times) [2024-03-29 14:16:03,198][00497] InferenceWorker_p0-w0: stopping experience collection (7000 times) [2024-03-29 14:16:03,198][00497] InferenceWorker_p0-w0: resuming experience collection (7000 times) [2024-03-29 14:16:03,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41506.0, 300 sec: 42098.5). Total num frames: 314785792. Throughput: 0: 41567.0. Samples: 196948180. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:16:03,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:16:06,133][00497] Updated weights for policy 0, policy_version 19219 (0.0025) [2024-03-29 14:16:08,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 315015168. Throughput: 0: 41757.0. Samples: 197201240. Policy #0 lag: (min: 1.0, avg: 21.4, max: 42.0) [2024-03-29 14:16:08,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:16:09,188][00497] Updated weights for policy 0, policy_version 19229 (0.0033) [2024-03-29 14:16:13,352][00497] Updated weights for policy 0, policy_version 19239 (0.0027) [2024-03-29 14:16:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 315228160. Throughput: 0: 42164.8. Samples: 197452400. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 14:16:13,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 14:16:17,816][00497] Updated weights for policy 0, policy_version 19249 (0.0032) [2024-03-29 14:16:18,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42043.0). Total num frames: 315408384. Throughput: 0: 41561.6. Samples: 197566320. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 14:16:18,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:16:21,701][00497] Updated weights for policy 0, policy_version 19259 (0.0019) [2024-03-29 14:16:23,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41233.1, 300 sec: 41987.5). Total num frames: 315637760. Throughput: 0: 42163.1. Samples: 197840740. Policy #0 lag: (min: 0.0, avg: 23.3, max: 42.0) [2024-03-29 14:16:23,842][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:16:24,982][00497] Updated weights for policy 0, policy_version 19269 (0.0020) [2024-03-29 14:16:28,839][00126] Fps is (10 sec: 44237.6, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 315850752. Throughput: 0: 41829.8. Samples: 198082540. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 14:16:28,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:16:28,916][00497] Updated weights for policy 0, policy_version 19279 (0.0021) [2024-03-29 14:16:33,230][00497] Updated weights for policy 0, policy_version 19289 (0.0027) [2024-03-29 14:16:33,552][00476] Signal inference workers to stop experience collection... (7050 times) [2024-03-29 14:16:33,552][00476] Signal inference workers to resume experience collection... (7050 times) [2024-03-29 14:16:33,594][00497] InferenceWorker_p0-w0: stopping experience collection (7050 times) [2024-03-29 14:16:33,594][00497] InferenceWorker_p0-w0: resuming experience collection (7050 times) [2024-03-29 14:16:33,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 316063744. Throughput: 0: 41496.5. Samples: 198195960. Policy #0 lag: (min: 0.0, avg: 21.9, max: 44.0) [2024-03-29 14:16:33,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:16:37,096][00497] Updated weights for policy 0, policy_version 19299 (0.0027) [2024-03-29 14:16:38,839][00126] Fps is (10 sec: 40959.2, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 316260352. Throughput: 0: 42044.7. Samples: 198467980. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:16:38,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:16:40,752][00497] Updated weights for policy 0, policy_version 19309 (0.0031) [2024-03-29 14:16:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 42043.0). Total num frames: 316489728. Throughput: 0: 41849.8. Samples: 198704940. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:16:43,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:16:44,520][00497] Updated weights for policy 0, policy_version 19319 (0.0021) [2024-03-29 14:16:48,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 316669952. Throughput: 0: 41981.3. Samples: 198837340. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:16:48,841][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 14:16:48,989][00497] Updated weights for policy 0, policy_version 19329 (0.0021) [2024-03-29 14:16:52,814][00497] Updated weights for policy 0, policy_version 19339 (0.0018) [2024-03-29 14:16:53,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 316899328. Throughput: 0: 42264.5. Samples: 199103140. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 14:16:53,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:16:56,145][00497] Updated weights for policy 0, policy_version 19349 (0.0025) [2024-03-29 14:16:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 317112320. Throughput: 0: 41797.4. Samples: 199333280. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 14:16:58,840][00126] Avg episode reward: [(0, '0.380')] [2024-03-29 14:17:00,078][00497] Updated weights for policy 0, policy_version 19359 (0.0021) [2024-03-29 14:17:03,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 317308928. Throughput: 0: 42445.9. Samples: 199476380. Policy #0 lag: (min: 1.0, avg: 18.4, max: 41.0) [2024-03-29 14:17:03,840][00126] Avg episode reward: [(0, '0.269')] [2024-03-29 14:17:03,858][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019367_317308928.pth... [2024-03-29 14:17:04,161][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000018755_307281920.pth [2024-03-29 14:17:04,762][00497] Updated weights for policy 0, policy_version 19369 (0.0022) [2024-03-29 14:17:05,054][00476] Signal inference workers to stop experience collection... (7100 times) [2024-03-29 14:17:05,054][00476] Signal inference workers to resume experience collection... (7100 times) [2024-03-29 14:17:05,092][00497] InferenceWorker_p0-w0: stopping experience collection (7100 times) [2024-03-29 14:17:05,092][00497] InferenceWorker_p0-w0: resuming experience collection (7100 times) [2024-03-29 14:17:08,303][00497] Updated weights for policy 0, policy_version 19379 (0.0022) [2024-03-29 14:17:08,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 317521920. Throughput: 0: 42073.0. Samples: 199734020. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:17:08,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:17:11,793][00497] Updated weights for policy 0, policy_version 19389 (0.0024) [2024-03-29 14:17:13,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 317751296. Throughput: 0: 41470.6. Samples: 199948720. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:17:13,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:17:16,094][00497] Updated weights for policy 0, policy_version 19399 (0.0028) [2024-03-29 14:17:18,839][00126] Fps is (10 sec: 44235.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 317964288. Throughput: 0: 42089.1. Samples: 200089980. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:17:18,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:17:20,498][00497] Updated weights for policy 0, policy_version 19409 (0.0024) [2024-03-29 14:17:23,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 318128128. Throughput: 0: 41936.5. Samples: 200355120. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:17:23,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:17:24,193][00497] Updated weights for policy 0, policy_version 19419 (0.0021) [2024-03-29 14:17:27,630][00497] Updated weights for policy 0, policy_version 19429 (0.0028) [2024-03-29 14:17:28,839][00126] Fps is (10 sec: 42599.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 318390272. Throughput: 0: 41660.9. Samples: 200579680. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:17:28,840][00126] Avg episode reward: [(0, '0.318')] [2024-03-29 14:17:31,743][00497] Updated weights for policy 0, policy_version 19439 (0.0022) [2024-03-29 14:17:33,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 318586880. Throughput: 0: 41604.0. Samples: 200709520. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:17:33,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 14:17:36,135][00497] Updated weights for policy 0, policy_version 19449 (0.0022) [2024-03-29 14:17:37,406][00476] Signal inference workers to stop experience collection... (7150 times) [2024-03-29 14:17:37,406][00476] Signal inference workers to resume experience collection... (7150 times) [2024-03-29 14:17:37,450][00497] InferenceWorker_p0-w0: stopping experience collection (7150 times) [2024-03-29 14:17:37,450][00497] InferenceWorker_p0-w0: resuming experience collection (7150 times) [2024-03-29 14:17:38,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 318750720. Throughput: 0: 41609.6. Samples: 200975580. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 14:17:38,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:17:39,902][00497] Updated weights for policy 0, policy_version 19459 (0.0023) [2024-03-29 14:17:43,441][00497] Updated weights for policy 0, policy_version 19469 (0.0020) [2024-03-29 14:17:43,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 318996480. Throughput: 0: 41663.2. Samples: 201208120. Policy #0 lag: (min: 0.0, avg: 22.8, max: 42.0) [2024-03-29 14:17:43,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:17:47,301][00497] Updated weights for policy 0, policy_version 19479 (0.0025) [2024-03-29 14:17:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 319193088. Throughput: 0: 41231.5. Samples: 201331800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 14:17:48,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 14:17:52,080][00497] Updated weights for policy 0, policy_version 19489 (0.0027) [2024-03-29 14:17:53,839][00126] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 41931.9). Total num frames: 319389696. Throughput: 0: 41562.5. Samples: 201604340. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 14:17:53,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:17:55,622][00497] Updated weights for policy 0, policy_version 19499 (0.0027) [2024-03-29 14:17:58,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 319619072. Throughput: 0: 42018.2. Samples: 201839540. Policy #0 lag: (min: 0.0, avg: 21.1, max: 42.0) [2024-03-29 14:17:58,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:17:59,001][00497] Updated weights for policy 0, policy_version 19509 (0.0024) [2024-03-29 14:18:03,161][00497] Updated weights for policy 0, policy_version 19519 (0.0028) [2024-03-29 14:18:03,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 319815680. Throughput: 0: 41345.9. Samples: 201950540. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 14:18:03,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:18:05,934][00476] Signal inference workers to stop experience collection... (7200 times) [2024-03-29 14:18:06,010][00497] InferenceWorker_p0-w0: stopping experience collection (7200 times) [2024-03-29 14:18:06,011][00476] Signal inference workers to resume experience collection... (7200 times) [2024-03-29 14:18:06,033][00497] InferenceWorker_p0-w0: resuming experience collection (7200 times) [2024-03-29 14:18:08,149][00497] Updated weights for policy 0, policy_version 19529 (0.0024) [2024-03-29 14:18:08,839][00126] Fps is (10 sec: 37682.6, 60 sec: 41232.9, 300 sec: 41876.4). Total num frames: 319995904. Throughput: 0: 41270.6. Samples: 202212300. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 14:18:08,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:18:11,712][00497] Updated weights for policy 0, policy_version 19539 (0.0022) [2024-03-29 14:18:13,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 41765.3). Total num frames: 320208896. Throughput: 0: 41707.1. Samples: 202456500. Policy #0 lag: (min: 0.0, avg: 21.6, max: 41.0) [2024-03-29 14:18:13,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:18:14,874][00497] Updated weights for policy 0, policy_version 19549 (0.0030) [2024-03-29 14:18:18,839][00126] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 320438272. Throughput: 0: 41204.9. Samples: 202563740. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:18:18,841][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:18:18,903][00497] Updated weights for policy 0, policy_version 19559 (0.0020) [2024-03-29 14:18:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 320602112. Throughput: 0: 41059.6. Samples: 202823260. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:18:23,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:18:23,896][00497] Updated weights for policy 0, policy_version 19569 (0.0023) [2024-03-29 14:18:27,386][00497] Updated weights for policy 0, policy_version 19579 (0.0029) [2024-03-29 14:18:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 40959.9, 300 sec: 41820.8). Total num frames: 320847872. Throughput: 0: 41744.4. Samples: 203086620. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:18:28,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:18:30,739][00497] Updated weights for policy 0, policy_version 19589 (0.0032) [2024-03-29 14:18:33,839][00126] Fps is (10 sec: 45875.3, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 321060864. Throughput: 0: 41345.8. Samples: 203192360. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 14:18:33,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:18:34,593][00497] Updated weights for policy 0, policy_version 19599 (0.0020) [2024-03-29 14:18:36,354][00476] Signal inference workers to stop experience collection... (7250 times) [2024-03-29 14:18:36,399][00497] InferenceWorker_p0-w0: stopping experience collection (7250 times) [2024-03-29 14:18:36,577][00476] Signal inference workers to resume experience collection... (7250 times) [2024-03-29 14:18:36,578][00497] InferenceWorker_p0-w0: resuming experience collection (7250 times) [2024-03-29 14:18:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 41820.8). Total num frames: 321241088. Throughput: 0: 41049.9. Samples: 203451580. Policy #0 lag: (min: 2.0, avg: 20.7, max: 42.0) [2024-03-29 14:18:38,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:18:39,400][00497] Updated weights for policy 0, policy_version 19609 (0.0026) [2024-03-29 14:18:43,086][00497] Updated weights for policy 0, policy_version 19619 (0.0021) [2024-03-29 14:18:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 321470464. Throughput: 0: 41790.7. Samples: 203720120. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 14:18:43,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:18:46,420][00497] Updated weights for policy 0, policy_version 19629 (0.0027) [2024-03-29 14:18:48,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 321699840. Throughput: 0: 41576.4. Samples: 203821480. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 14:18:48,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:18:50,491][00497] Updated weights for policy 0, policy_version 19639 (0.0026) [2024-03-29 14:18:53,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 321896448. Throughput: 0: 41749.9. Samples: 204091040. Policy #0 lag: (min: 1.0, avg: 20.0, max: 41.0) [2024-03-29 14:18:53,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:18:55,397][00497] Updated weights for policy 0, policy_version 19649 (0.0021) [2024-03-29 14:18:58,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41709.8). Total num frames: 322076672. Throughput: 0: 41797.3. Samples: 204337380. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:18:58,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 14:18:58,939][00497] Updated weights for policy 0, policy_version 19659 (0.0030) [2024-03-29 14:19:02,040][00497] Updated weights for policy 0, policy_version 19669 (0.0022) [2024-03-29 14:19:03,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.1, 300 sec: 41820.8). Total num frames: 322322432. Throughput: 0: 42181.3. Samples: 204461900. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:19:03,841][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:19:03,948][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019674_322338816.pth... [2024-03-29 14:19:04,269][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019062_312311808.pth [2024-03-29 14:19:06,191][00497] Updated weights for policy 0, policy_version 19679 (0.0021) [2024-03-29 14:19:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 322519040. Throughput: 0: 41859.0. Samples: 204706920. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:19:08,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:19:11,050][00476] Signal inference workers to stop experience collection... (7300 times) [2024-03-29 14:19:11,093][00497] InferenceWorker_p0-w0: stopping experience collection (7300 times) [2024-03-29 14:19:11,276][00476] Signal inference workers to resume experience collection... (7300 times) [2024-03-29 14:19:11,277][00497] InferenceWorker_p0-w0: resuming experience collection (7300 times) [2024-03-29 14:19:11,280][00497] Updated weights for policy 0, policy_version 19689 (0.0029) [2024-03-29 14:19:13,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 322699264. Throughput: 0: 41810.2. Samples: 204968080. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 14:19:13,842][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 14:19:14,692][00497] Updated weights for policy 0, policy_version 19699 (0.0025) [2024-03-29 14:19:17,861][00497] Updated weights for policy 0, policy_version 19709 (0.0024) [2024-03-29 14:19:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 322961408. Throughput: 0: 42003.5. Samples: 205082520. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 14:19:18,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:19:21,953][00497] Updated weights for policy 0, policy_version 19719 (0.0020) [2024-03-29 14:19:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 323141632. Throughput: 0: 41538.2. Samples: 205320800. Policy #0 lag: (min: 0.0, avg: 19.0, max: 41.0) [2024-03-29 14:19:23,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 14:19:26,817][00497] Updated weights for policy 0, policy_version 19729 (0.0017) [2024-03-29 14:19:28,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 323321856. Throughput: 0: 41592.0. Samples: 205591760. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 14:19:28,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:19:30,333][00497] Updated weights for policy 0, policy_version 19739 (0.0017) [2024-03-29 14:19:33,556][00497] Updated weights for policy 0, policy_version 19749 (0.0020) [2024-03-29 14:19:33,839][00126] Fps is (10 sec: 44237.6, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 323584000. Throughput: 0: 41849.0. Samples: 205704680. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 14:19:33,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:19:37,595][00497] Updated weights for policy 0, policy_version 19759 (0.0024) [2024-03-29 14:19:38,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 323764224. Throughput: 0: 41284.0. Samples: 205948820. Policy #0 lag: (min: 0.0, avg: 22.0, max: 43.0) [2024-03-29 14:19:38,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 14:19:42,157][00497] Updated weights for policy 0, policy_version 19769 (0.0021) [2024-03-29 14:19:43,839][00126] Fps is (10 sec: 36044.5, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 323944448. Throughput: 0: 41817.7. Samples: 206219180. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:19:43,840][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:19:45,155][00476] Signal inference workers to stop experience collection... (7350 times) [2024-03-29 14:19:45,203][00497] InferenceWorker_p0-w0: stopping experience collection (7350 times) [2024-03-29 14:19:45,240][00476] Signal inference workers to resume experience collection... (7350 times) [2024-03-29 14:19:45,243][00497] InferenceWorker_p0-w0: resuming experience collection (7350 times) [2024-03-29 14:19:45,799][00497] Updated weights for policy 0, policy_version 19779 (0.0025) [2024-03-29 14:19:48,839][00126] Fps is (10 sec: 44236.4, 60 sec: 41779.2, 300 sec: 41765.5). Total num frames: 324206592. Throughput: 0: 41804.9. Samples: 206343120. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:19:48,842][00126] Avg episode reward: [(0, '0.330')] [2024-03-29 14:19:49,045][00497] Updated weights for policy 0, policy_version 19789 (0.0024) [2024-03-29 14:19:53,162][00497] Updated weights for policy 0, policy_version 19799 (0.0026) [2024-03-29 14:19:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 324386816. Throughput: 0: 41645.5. Samples: 206580960. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 14:19:53,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:19:58,052][00497] Updated weights for policy 0, policy_version 19809 (0.0020) [2024-03-29 14:19:58,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 324583424. Throughput: 0: 41817.4. Samples: 206849860. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 14:19:58,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:20:01,537][00497] Updated weights for policy 0, policy_version 19819 (0.0026) [2024-03-29 14:20:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 324812800. Throughput: 0: 41977.5. Samples: 206971500. Policy #0 lag: (min: 0.0, avg: 22.3, max: 42.0) [2024-03-29 14:20:03,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:20:04,751][00497] Updated weights for policy 0, policy_version 19829 (0.0022) [2024-03-29 14:20:08,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 325025792. Throughput: 0: 41720.0. Samples: 207198200. Policy #0 lag: (min: 0.0, avg: 23.5, max: 41.0) [2024-03-29 14:20:08,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:20:09,028][00497] Updated weights for policy 0, policy_version 19839 (0.0019) [2024-03-29 14:20:13,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 325189632. Throughput: 0: 41665.3. Samples: 207466700. Policy #0 lag: (min: 0.0, avg: 23.5, max: 41.0) [2024-03-29 14:20:13,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 14:20:13,968][00497] Updated weights for policy 0, policy_version 19849 (0.0029) [2024-03-29 14:20:17,314][00497] Updated weights for policy 0, policy_version 19859 (0.0024) [2024-03-29 14:20:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 325435392. Throughput: 0: 42122.6. Samples: 207600200. Policy #0 lag: (min: 0.0, avg: 23.5, max: 41.0) [2024-03-29 14:20:18,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:20:19,350][00476] Signal inference workers to stop experience collection... (7400 times) [2024-03-29 14:20:19,391][00497] InferenceWorker_p0-w0: stopping experience collection (7400 times) [2024-03-29 14:20:19,580][00476] Signal inference workers to resume experience collection... (7400 times) [2024-03-29 14:20:19,580][00497] InferenceWorker_p0-w0: resuming experience collection (7400 times) [2024-03-29 14:20:20,427][00497] Updated weights for policy 0, policy_version 19869 (0.0025) [2024-03-29 14:20:23,839][00126] Fps is (10 sec: 47513.9, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 325664768. Throughput: 0: 41729.3. Samples: 207826640. Policy #0 lag: (min: 1.0, avg: 24.0, max: 43.0) [2024-03-29 14:20:23,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:20:24,378][00497] Updated weights for policy 0, policy_version 19879 (0.0022) [2024-03-29 14:20:28,839][00126] Fps is (10 sec: 40959.5, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 325844992. Throughput: 0: 41674.6. Samples: 208094540. Policy #0 lag: (min: 1.0, avg: 24.0, max: 43.0) [2024-03-29 14:20:28,841][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:20:29,311][00497] Updated weights for policy 0, policy_version 19889 (0.0022) [2024-03-29 14:20:32,947][00497] Updated weights for policy 0, policy_version 19899 (0.0024) [2024-03-29 14:20:33,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 326057984. Throughput: 0: 42112.0. Samples: 208238160. Policy #0 lag: (min: 1.0, avg: 24.0, max: 43.0) [2024-03-29 14:20:33,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:20:35,983][00497] Updated weights for policy 0, policy_version 19909 (0.0022) [2024-03-29 14:20:38,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 326303744. Throughput: 0: 41571.0. Samples: 208451660. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:20:38,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:20:40,036][00497] Updated weights for policy 0, policy_version 19919 (0.0028) [2024-03-29 14:20:43,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 326483968. Throughput: 0: 41439.6. Samples: 208714640. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:20:43,840][00126] Avg episode reward: [(0, '0.348')] [2024-03-29 14:20:44,791][00497] Updated weights for policy 0, policy_version 19929 (0.0020) [2024-03-29 14:20:48,563][00497] Updated weights for policy 0, policy_version 19939 (0.0023) [2024-03-29 14:20:48,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.0, 300 sec: 41654.2). Total num frames: 326680576. Throughput: 0: 41830.5. Samples: 208853880. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:20:48,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:20:51,216][00476] Signal inference workers to stop experience collection... (7450 times) [2024-03-29 14:20:51,251][00497] InferenceWorker_p0-w0: stopping experience collection (7450 times) [2024-03-29 14:20:51,432][00476] Signal inference workers to resume experience collection... (7450 times) [2024-03-29 14:20:51,432][00497] InferenceWorker_p0-w0: resuming experience collection (7450 times) [2024-03-29 14:20:51,690][00497] Updated weights for policy 0, policy_version 19949 (0.0028) [2024-03-29 14:20:53,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42598.3, 300 sec: 41820.9). Total num frames: 326942720. Throughput: 0: 41884.1. Samples: 209082980. Policy #0 lag: (min: 2.0, avg: 19.5, max: 41.0) [2024-03-29 14:20:53,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:20:55,684][00497] Updated weights for policy 0, policy_version 19959 (0.0018) [2024-03-29 14:20:58,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42598.5, 300 sec: 41876.4). Total num frames: 327139328. Throughput: 0: 41972.9. Samples: 209355480. Policy #0 lag: (min: 2.0, avg: 19.5, max: 41.0) [2024-03-29 14:20:58,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 14:21:00,396][00497] Updated weights for policy 0, policy_version 19969 (0.0030) [2024-03-29 14:21:03,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 327319552. Throughput: 0: 42019.0. Samples: 209491060. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 14:21:03,841][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 14:21:04,044][00497] Updated weights for policy 0, policy_version 19979 (0.0022) [2024-03-29 14:21:04,331][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019980_327352320.pth... [2024-03-29 14:21:04,669][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019367_317308928.pth [2024-03-29 14:21:07,247][00497] Updated weights for policy 0, policy_version 19989 (0.0022) [2024-03-29 14:21:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 327565312. Throughput: 0: 42077.2. Samples: 209720120. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 14:21:08,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 14:21:11,496][00497] Updated weights for policy 0, policy_version 19999 (0.0027) [2024-03-29 14:21:13,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42598.4, 300 sec: 41820.9). Total num frames: 327745536. Throughput: 0: 41854.7. Samples: 209978000. Policy #0 lag: (min: 1.0, avg: 18.2, max: 41.0) [2024-03-29 14:21:13,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:21:16,219][00497] Updated weights for policy 0, policy_version 20009 (0.0031) [2024-03-29 14:21:18,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 327925760. Throughput: 0: 41684.5. Samples: 210113960. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 14:21:18,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 14:21:19,918][00497] Updated weights for policy 0, policy_version 20019 (0.0019) [2024-03-29 14:21:23,077][00476] Signal inference workers to stop experience collection... (7500 times) [2024-03-29 14:21:23,131][00497] InferenceWorker_p0-w0: stopping experience collection (7500 times) [2024-03-29 14:21:23,165][00476] Signal inference workers to resume experience collection... (7500 times) [2024-03-29 14:21:23,167][00497] InferenceWorker_p0-w0: resuming experience collection (7500 times) [2024-03-29 14:21:23,172][00497] Updated weights for policy 0, policy_version 20029 (0.0034) [2024-03-29 14:21:23,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 328187904. Throughput: 0: 42279.1. Samples: 210354220. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 14:21:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:21:27,422][00497] Updated weights for policy 0, policy_version 20039 (0.0022) [2024-03-29 14:21:28,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.4, 300 sec: 41709.8). Total num frames: 328368128. Throughput: 0: 41807.1. Samples: 210595960. Policy #0 lag: (min: 0.0, avg: 19.4, max: 42.0) [2024-03-29 14:21:28,841][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 14:21:31,912][00497] Updated weights for policy 0, policy_version 20049 (0.0019) [2024-03-29 14:21:33,839][00126] Fps is (10 sec: 34406.1, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 328531968. Throughput: 0: 41507.5. Samples: 210721720. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:21:33,840][00126] Avg episode reward: [(0, '0.401')] [2024-03-29 14:21:35,687][00497] Updated weights for policy 0, policy_version 20059 (0.0025) [2024-03-29 14:21:38,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 328794112. Throughput: 0: 41975.9. Samples: 210971900. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:21:38,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:21:39,039][00497] Updated weights for policy 0, policy_version 20069 (0.0019) [2024-03-29 14:21:43,282][00497] Updated weights for policy 0, policy_version 20079 (0.0026) [2024-03-29 14:21:43,839][00126] Fps is (10 sec: 45875.9, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 328990720. Throughput: 0: 41283.1. Samples: 211213220. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:21:43,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:21:47,382][00497] Updated weights for policy 0, policy_version 20089 (0.0026) [2024-03-29 14:21:48,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 329187328. Throughput: 0: 40999.2. Samples: 211336020. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 14:21:48,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:21:51,437][00497] Updated weights for policy 0, policy_version 20099 (0.0023) [2024-03-29 14:21:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 329400320. Throughput: 0: 41796.9. Samples: 211600980. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 14:21:53,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:21:54,695][00497] Updated weights for policy 0, policy_version 20109 (0.0031) [2024-03-29 14:21:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 329613312. Throughput: 0: 41374.2. Samples: 211839840. Policy #0 lag: (min: 2.0, avg: 22.3, max: 43.0) [2024-03-29 14:21:58,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:21:58,950][00497] Updated weights for policy 0, policy_version 20119 (0.0019) [2024-03-29 14:21:59,467][00476] Signal inference workers to stop experience collection... (7550 times) [2024-03-29 14:21:59,510][00497] InferenceWorker_p0-w0: stopping experience collection (7550 times) [2024-03-29 14:21:59,547][00476] Signal inference workers to resume experience collection... (7550 times) [2024-03-29 14:21:59,575][00497] InferenceWorker_p0-w0: resuming experience collection (7550 times) [2024-03-29 14:22:03,151][00497] Updated weights for policy 0, policy_version 20129 (0.0026) [2024-03-29 14:22:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 329809920. Throughput: 0: 41197.7. Samples: 211967860. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 14:22:03,841][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:22:07,258][00497] Updated weights for policy 0, policy_version 20139 (0.0029) [2024-03-29 14:22:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40960.1, 300 sec: 41598.7). Total num frames: 330022912. Throughput: 0: 41737.9. Samples: 212232420. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 14:22:08,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:22:10,454][00497] Updated weights for policy 0, policy_version 20149 (0.0030) [2024-03-29 14:22:13,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 330252288. Throughput: 0: 41320.4. Samples: 212455380. Policy #0 lag: (min: 1.0, avg: 21.5, max: 42.0) [2024-03-29 14:22:13,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 14:22:14,797][00497] Updated weights for policy 0, policy_version 20159 (0.0021) [2024-03-29 14:22:18,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 330432512. Throughput: 0: 41494.0. Samples: 212588940. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 14:22:18,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:22:19,339][00497] Updated weights for policy 0, policy_version 20169 (0.0022) [2024-03-29 14:22:23,115][00497] Updated weights for policy 0, policy_version 20179 (0.0020) [2024-03-29 14:22:23,839][00126] Fps is (10 sec: 39321.5, 60 sec: 40960.0, 300 sec: 41543.1). Total num frames: 330645504. Throughput: 0: 41900.5. Samples: 212857420. Policy #0 lag: (min: 0.0, avg: 21.5, max: 41.0) [2024-03-29 14:22:23,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:22:26,083][00497] Updated weights for policy 0, policy_version 20189 (0.0028) [2024-03-29 14:22:28,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 330891264. Throughput: 0: 41447.5. Samples: 213078360. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 14:22:28,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:22:30,367][00497] Updated weights for policy 0, policy_version 20199 (0.0018) [2024-03-29 14:22:32,053][00476] Signal inference workers to stop experience collection... (7600 times) [2024-03-29 14:22:32,095][00497] InferenceWorker_p0-w0: stopping experience collection (7600 times) [2024-03-29 14:22:32,133][00476] Signal inference workers to resume experience collection... (7600 times) [2024-03-29 14:22:32,135][00497] InferenceWorker_p0-w0: resuming experience collection (7600 times) [2024-03-29 14:22:33,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 331071488. Throughput: 0: 41879.5. Samples: 213220600. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 14:22:33,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:22:34,652][00497] Updated weights for policy 0, policy_version 20209 (0.0032) [2024-03-29 14:22:38,594][00497] Updated weights for policy 0, policy_version 20219 (0.0019) [2024-03-29 14:22:38,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 331268096. Throughput: 0: 41940.9. Samples: 213488320. Policy #0 lag: (min: 0.0, avg: 22.9, max: 43.0) [2024-03-29 14:22:38,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:22:41,992][00497] Updated weights for policy 0, policy_version 20229 (0.0026) [2024-03-29 14:22:43,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.3, 300 sec: 41820.8). Total num frames: 331530240. Throughput: 0: 41526.2. Samples: 213708520. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 14:22:43,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:22:46,173][00497] Updated weights for policy 0, policy_version 20239 (0.0036) [2024-03-29 14:22:48,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41820.9). Total num frames: 331726848. Throughput: 0: 41863.6. Samples: 213851720. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 14:22:48,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:22:50,061][00497] Updated weights for policy 0, policy_version 20249 (0.0018) [2024-03-29 14:22:53,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 331907072. Throughput: 0: 42114.2. Samples: 214127560. Policy #0 lag: (min: 1.0, avg: 22.7, max: 41.0) [2024-03-29 14:22:53,840][00126] Avg episode reward: [(0, '0.319')] [2024-03-29 14:22:54,025][00497] Updated weights for policy 0, policy_version 20259 (0.0022) [2024-03-29 14:22:57,279][00497] Updated weights for policy 0, policy_version 20269 (0.0019) [2024-03-29 14:22:58,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 332169216. Throughput: 0: 42249.7. Samples: 214356620. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 14:22:58,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:23:01,564][00497] Updated weights for policy 0, policy_version 20279 (0.0024) [2024-03-29 14:23:01,895][00476] Signal inference workers to stop experience collection... (7650 times) [2024-03-29 14:23:01,920][00497] InferenceWorker_p0-w0: stopping experience collection (7650 times) [2024-03-29 14:23:02,115][00476] Signal inference workers to resume experience collection... (7650 times) [2024-03-29 14:23:02,115][00497] InferenceWorker_p0-w0: resuming experience collection (7650 times) [2024-03-29 14:23:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 41932.0). Total num frames: 332365824. Throughput: 0: 42195.1. Samples: 214487720. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 14:23:03,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:23:04,087][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020287_332382208.pth... [2024-03-29 14:23:04,387][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019674_322338816.pth [2024-03-29 14:23:05,552][00497] Updated weights for policy 0, policy_version 20289 (0.0024) [2024-03-29 14:23:08,839][00126] Fps is (10 sec: 36045.2, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 332529664. Throughput: 0: 42188.5. Samples: 214755900. Policy #0 lag: (min: 2.0, avg: 21.5, max: 43.0) [2024-03-29 14:23:08,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:23:09,649][00497] Updated weights for policy 0, policy_version 20299 (0.0017) [2024-03-29 14:23:12,687][00497] Updated weights for policy 0, policy_version 20309 (0.0027) [2024-03-29 14:23:13,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 332791808. Throughput: 0: 42538.2. Samples: 214992580. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 14:23:13,840][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 14:23:16,712][00497] Updated weights for policy 0, policy_version 20319 (0.0021) [2024-03-29 14:23:18,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 332988416. Throughput: 0: 42314.8. Samples: 215124760. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 14:23:18,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:23:21,259][00497] Updated weights for policy 0, policy_version 20329 (0.0023) [2024-03-29 14:23:23,839][00126] Fps is (10 sec: 36045.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 333152256. Throughput: 0: 42008.5. Samples: 215378700. Policy #0 lag: (min: 1.0, avg: 20.3, max: 42.0) [2024-03-29 14:23:23,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:23:25,444][00497] Updated weights for policy 0, policy_version 20339 (0.0028) [2024-03-29 14:23:28,383][00497] Updated weights for policy 0, policy_version 20349 (0.0033) [2024-03-29 14:23:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 333414400. Throughput: 0: 42308.5. Samples: 215612400. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 14:23:28,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:23:32,107][00476] Signal inference workers to stop experience collection... (7700 times) [2024-03-29 14:23:32,185][00497] InferenceWorker_p0-w0: stopping experience collection (7700 times) [2024-03-29 14:23:32,273][00476] Signal inference workers to resume experience collection... (7700 times) [2024-03-29 14:23:32,274][00497] InferenceWorker_p0-w0: resuming experience collection (7700 times) [2024-03-29 14:23:32,582][00497] Updated weights for policy 0, policy_version 20359 (0.0025) [2024-03-29 14:23:33,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 333611008. Throughput: 0: 41860.4. Samples: 215735440. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 14:23:33,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:23:37,007][00497] Updated weights for policy 0, policy_version 20369 (0.0023) [2024-03-29 14:23:38,839][00126] Fps is (10 sec: 36044.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 333774848. Throughput: 0: 41559.5. Samples: 215997740. Policy #0 lag: (min: 0.0, avg: 18.5, max: 42.0) [2024-03-29 14:23:38,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:23:41,116][00497] Updated weights for policy 0, policy_version 20379 (0.0027) [2024-03-29 14:23:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 334020608. Throughput: 0: 42114.8. Samples: 216251780. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 14:23:43,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 14:23:44,182][00497] Updated weights for policy 0, policy_version 20389 (0.0029) [2024-03-29 14:23:48,255][00497] Updated weights for policy 0, policy_version 20399 (0.0031) [2024-03-29 14:23:48,839][00126] Fps is (10 sec: 45875.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 334233600. Throughput: 0: 41635.6. Samples: 216361320. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 14:23:48,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 14:23:52,447][00497] Updated weights for policy 0, policy_version 20409 (0.0023) [2024-03-29 14:23:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 334413824. Throughput: 0: 41678.2. Samples: 216631420. Policy #0 lag: (min: 1.0, avg: 18.1, max: 42.0) [2024-03-29 14:23:53,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:23:56,878][00497] Updated weights for policy 0, policy_version 20419 (0.0021) [2024-03-29 14:23:58,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41233.1, 300 sec: 41765.3). Total num frames: 334643200. Throughput: 0: 42037.8. Samples: 216884280. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 14:23:58,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:23:59,899][00497] Updated weights for policy 0, policy_version 20429 (0.0030) [2024-03-29 14:24:03,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 334856192. Throughput: 0: 41167.9. Samples: 216977320. Policy #0 lag: (min: 0.0, avg: 19.2, max: 42.0) [2024-03-29 14:24:03,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:24:04,018][00497] Updated weights for policy 0, policy_version 20439 (0.0024) [2024-03-29 14:24:04,442][00476] Signal inference workers to stop experience collection... (7750 times) [2024-03-29 14:24:04,521][00497] InferenceWorker_p0-w0: stopping experience collection (7750 times) [2024-03-29 14:24:04,522][00476] Signal inference workers to resume experience collection... (7750 times) [2024-03-29 14:24:04,547][00497] InferenceWorker_p0-w0: resuming experience collection (7750 times) [2024-03-29 14:24:08,207][00497] Updated weights for policy 0, policy_version 20449 (0.0030) [2024-03-29 14:24:08,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 41932.0). Total num frames: 335069184. Throughput: 0: 41523.1. Samples: 217247240. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:24:08,840][00126] Avg episode reward: [(0, '0.450')] [2024-03-29 14:24:12,453][00497] Updated weights for policy 0, policy_version 20459 (0.0023) [2024-03-29 14:24:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 335265792. Throughput: 0: 42301.2. Samples: 217515960. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:24:13,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:24:15,430][00497] Updated weights for policy 0, policy_version 20469 (0.0020) [2024-03-29 14:24:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 335495168. Throughput: 0: 41870.7. Samples: 217619620. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:24:18,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 14:24:19,574][00497] Updated weights for policy 0, policy_version 20479 (0.0025) [2024-03-29 14:24:23,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 335675392. Throughput: 0: 41617.4. Samples: 217870520. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:24:23,841][00126] Avg episode reward: [(0, '0.306')] [2024-03-29 14:24:23,936][00497] Updated weights for policy 0, policy_version 20489 (0.0021) [2024-03-29 14:24:27,981][00497] Updated weights for policy 0, policy_version 20499 (0.0019) [2024-03-29 14:24:28,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 335888384. Throughput: 0: 42203.1. Samples: 218150920. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:24:28,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:24:30,900][00497] Updated weights for policy 0, policy_version 20509 (0.0023) [2024-03-29 14:24:31,592][00476] Signal inference workers to stop experience collection... (7800 times) [2024-03-29 14:24:31,656][00497] InferenceWorker_p0-w0: stopping experience collection (7800 times) [2024-03-29 14:24:31,754][00476] Signal inference workers to resume experience collection... (7800 times) [2024-03-29 14:24:31,755][00497] InferenceWorker_p0-w0: resuming experience collection (7800 times) [2024-03-29 14:24:33,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 336134144. Throughput: 0: 41990.6. Samples: 218250900. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:24:33,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 14:24:35,078][00497] Updated weights for policy 0, policy_version 20519 (0.0020) [2024-03-29 14:24:38,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 336314368. Throughput: 0: 41696.8. Samples: 218507780. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:24:38,840][00126] Avg episode reward: [(0, '0.349')] [2024-03-29 14:24:39,300][00497] Updated weights for policy 0, policy_version 20529 (0.0034) [2024-03-29 14:24:43,658][00497] Updated weights for policy 0, policy_version 20539 (0.0029) [2024-03-29 14:24:43,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 336510976. Throughput: 0: 42188.0. Samples: 218782740. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:24:43,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:24:46,605][00497] Updated weights for policy 0, policy_version 20549 (0.0026) [2024-03-29 14:24:48,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 336773120. Throughput: 0: 42475.7. Samples: 218888720. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 14:24:48,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 14:24:50,864][00497] Updated weights for policy 0, policy_version 20559 (0.0026) [2024-03-29 14:24:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 336953344. Throughput: 0: 41881.2. Samples: 219131900. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 14:24:53,841][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 14:24:55,064][00497] Updated weights for policy 0, policy_version 20569 (0.0023) [2024-03-29 14:24:58,839][00126] Fps is (10 sec: 36044.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 337133568. Throughput: 0: 42065.8. Samples: 219408920. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 14:24:58,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:24:59,377][00497] Updated weights for policy 0, policy_version 20579 (0.0022) [2024-03-29 14:25:02,275][00497] Updated weights for policy 0, policy_version 20589 (0.0030) [2024-03-29 14:25:03,704][00476] Signal inference workers to stop experience collection... (7850 times) [2024-03-29 14:25:03,704][00476] Signal inference workers to resume experience collection... (7850 times) [2024-03-29 14:25:03,738][00497] InferenceWorker_p0-w0: stopping experience collection (7850 times) [2024-03-29 14:25:03,738][00497] InferenceWorker_p0-w0: resuming experience collection (7850 times) [2024-03-29 14:25:03,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 337412096. Throughput: 0: 42192.4. Samples: 219518280. Policy #0 lag: (min: 1.0, avg: 21.3, max: 41.0) [2024-03-29 14:25:03,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:25:04,003][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020595_337428480.pth... [2024-03-29 14:25:04,316][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000019980_327352320.pth [2024-03-29 14:25:06,440][00497] Updated weights for policy 0, policy_version 20599 (0.0028) [2024-03-29 14:25:08,839][00126] Fps is (10 sec: 47513.2, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 337608704. Throughput: 0: 42177.6. Samples: 219768520. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 14:25:08,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 14:25:10,571][00497] Updated weights for policy 0, policy_version 20609 (0.0024) [2024-03-29 14:25:13,839][00126] Fps is (10 sec: 34406.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 337756160. Throughput: 0: 41976.9. Samples: 220039880. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 14:25:13,840][00126] Avg episode reward: [(0, '0.294')] [2024-03-29 14:25:15,019][00497] Updated weights for policy 0, policy_version 20619 (0.0033) [2024-03-29 14:25:18,102][00497] Updated weights for policy 0, policy_version 20629 (0.0021) [2024-03-29 14:25:18,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 338018304. Throughput: 0: 42191.1. Samples: 220149500. Policy #0 lag: (min: 0.0, avg: 20.7, max: 40.0) [2024-03-29 14:25:18,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:25:22,316][00497] Updated weights for policy 0, policy_version 20639 (0.0019) [2024-03-29 14:25:23,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.4, 300 sec: 41932.0). Total num frames: 338214912. Throughput: 0: 41765.9. Samples: 220387240. Policy #0 lag: (min: 0.0, avg: 23.4, max: 41.0) [2024-03-29 14:25:23,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:25:26,696][00497] Updated weights for policy 0, policy_version 20649 (0.0019) [2024-03-29 14:25:28,839][00126] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 338378752. Throughput: 0: 41630.3. Samples: 220656100. Policy #0 lag: (min: 0.0, avg: 23.4, max: 41.0) [2024-03-29 14:25:28,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:25:30,690][00497] Updated weights for policy 0, policy_version 20659 (0.0020) [2024-03-29 14:25:33,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 338624512. Throughput: 0: 41952.0. Samples: 220776560. Policy #0 lag: (min: 0.0, avg: 23.4, max: 41.0) [2024-03-29 14:25:33,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:25:33,872][00497] Updated weights for policy 0, policy_version 20669 (0.0038) [2024-03-29 14:25:38,191][00497] Updated weights for policy 0, policy_version 20679 (0.0023) [2024-03-29 14:25:38,492][00476] Signal inference workers to stop experience collection... (7900 times) [2024-03-29 14:25:38,494][00476] Signal inference workers to resume experience collection... (7900 times) [2024-03-29 14:25:38,538][00497] InferenceWorker_p0-w0: stopping experience collection (7900 times) [2024-03-29 14:25:38,538][00497] InferenceWorker_p0-w0: resuming experience collection (7900 times) [2024-03-29 14:25:38,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 338837504. Throughput: 0: 41585.8. Samples: 221003260. Policy #0 lag: (min: 0.0, avg: 24.0, max: 43.0) [2024-03-29 14:25:38,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:25:42,360][00497] Updated weights for policy 0, policy_version 20689 (0.0022) [2024-03-29 14:25:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 339001344. Throughput: 0: 41388.5. Samples: 221271400. Policy #0 lag: (min: 0.0, avg: 24.0, max: 43.0) [2024-03-29 14:25:43,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:25:46,402][00497] Updated weights for policy 0, policy_version 20699 (0.0021) [2024-03-29 14:25:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.0, 300 sec: 41709.8). Total num frames: 339247104. Throughput: 0: 41951.1. Samples: 221406080. Policy #0 lag: (min: 0.0, avg: 24.0, max: 43.0) [2024-03-29 14:25:48,840][00126] Avg episode reward: [(0, '0.235')] [2024-03-29 14:25:49,758][00497] Updated weights for policy 0, policy_version 20709 (0.0024) [2024-03-29 14:25:53,839][00126] Fps is (10 sec: 44236.3, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 339443712. Throughput: 0: 41040.0. Samples: 221615320. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 14:25:53,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:25:54,014][00497] Updated weights for policy 0, policy_version 20719 (0.0019) [2024-03-29 14:25:58,231][00497] Updated weights for policy 0, policy_version 20729 (0.0020) [2024-03-29 14:25:58,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 339640320. Throughput: 0: 41046.7. Samples: 221886980. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 14:25:58,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:26:02,377][00497] Updated weights for policy 0, policy_version 20739 (0.0022) [2024-03-29 14:26:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 40686.9, 300 sec: 41654.2). Total num frames: 339853312. Throughput: 0: 41697.3. Samples: 222025880. Policy #0 lag: (min: 1.0, avg: 21.7, max: 41.0) [2024-03-29 14:26:03,840][00126] Avg episode reward: [(0, '0.406')] [2024-03-29 14:26:05,646][00497] Updated weights for policy 0, policy_version 20749 (0.0024) [2024-03-29 14:26:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 340082688. Throughput: 0: 41017.7. Samples: 222233040. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 14:26:08,841][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:26:09,673][00497] Updated weights for policy 0, policy_version 20759 (0.0031) [2024-03-29 14:26:13,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 340246528. Throughput: 0: 40903.5. Samples: 222496760. Policy #0 lag: (min: 2.0, avg: 20.1, max: 43.0) [2024-03-29 14:26:13,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:26:14,182][00497] Updated weights for policy 0, policy_version 20769 (0.0021) [2024-03-29 14:26:17,270][00476] Signal inference workers to stop experience collection... (7950 times) [2024-03-29 14:26:17,315][00497] InferenceWorker_p0-w0: stopping experience collection (7950 times) [2024-03-29 14:26:17,468][00476] Signal inference workers to resume experience collection... (7950 times) [2024-03-29 14:26:17,469][00497] InferenceWorker_p0-w0: resuming experience collection (7950 times) [2024-03-29 14:26:18,247][00497] Updated weights for policy 0, policy_version 20779 (0.0023) [2024-03-29 14:26:18,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40687.0, 300 sec: 41598.7). Total num frames: 340459520. Throughput: 0: 41474.2. Samples: 222642900. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 14:26:18,840][00126] Avg episode reward: [(0, '0.286')] [2024-03-29 14:26:21,403][00497] Updated weights for policy 0, policy_version 20789 (0.0025) [2024-03-29 14:26:23,839][00126] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 41820.9). Total num frames: 340705280. Throughput: 0: 40993.9. Samples: 222847980. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 14:26:23,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:26:25,627][00497] Updated weights for policy 0, policy_version 20799 (0.0033) [2024-03-29 14:26:28,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 340885504. Throughput: 0: 40682.1. Samples: 223102100. Policy #0 lag: (min: 2.0, avg: 19.4, max: 42.0) [2024-03-29 14:26:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:26:30,078][00497] Updated weights for policy 0, policy_version 20809 (0.0021) [2024-03-29 14:26:33,839][00126] Fps is (10 sec: 36044.8, 60 sec: 40686.9, 300 sec: 41598.7). Total num frames: 341065728. Throughput: 0: 41004.1. Samples: 223251260. Policy #0 lag: (min: 2.0, avg: 17.9, max: 42.0) [2024-03-29 14:26:33,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 14:26:34,252][00497] Updated weights for policy 0, policy_version 20819 (0.0020) [2024-03-29 14:26:37,372][00497] Updated weights for policy 0, policy_version 20829 (0.0029) [2024-03-29 14:26:38,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 341327872. Throughput: 0: 41366.3. Samples: 223476800. Policy #0 lag: (min: 2.0, avg: 17.9, max: 42.0) [2024-03-29 14:26:38,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:26:41,351][00497] Updated weights for policy 0, policy_version 20839 (0.0025) [2024-03-29 14:26:43,839][00126] Fps is (10 sec: 45874.6, 60 sec: 42052.2, 300 sec: 41820.8). Total num frames: 341524480. Throughput: 0: 41074.6. Samples: 223735340. Policy #0 lag: (min: 2.0, avg: 17.9, max: 42.0) [2024-03-29 14:26:43,841][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:26:45,780][00497] Updated weights for policy 0, policy_version 20849 (0.0018) [2024-03-29 14:26:48,839][00126] Fps is (10 sec: 36044.9, 60 sec: 40686.9, 300 sec: 41654.2). Total num frames: 341688320. Throughput: 0: 40872.0. Samples: 223865120. Policy #0 lag: (min: 0.0, avg: 16.6, max: 40.0) [2024-03-29 14:26:48,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:26:49,506][00476] Signal inference workers to stop experience collection... (8000 times) [2024-03-29 14:26:49,528][00497] InferenceWorker_p0-w0: stopping experience collection (8000 times) [2024-03-29 14:26:49,727][00476] Signal inference workers to resume experience collection... (8000 times) [2024-03-29 14:26:49,727][00497] InferenceWorker_p0-w0: resuming experience collection (8000 times) [2024-03-29 14:26:50,039][00497] Updated weights for policy 0, policy_version 20859 (0.0020) [2024-03-29 14:26:53,039][00497] Updated weights for policy 0, policy_version 20869 (0.0031) [2024-03-29 14:26:53,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 341950464. Throughput: 0: 41906.2. Samples: 224118820. Policy #0 lag: (min: 0.0, avg: 16.6, max: 40.0) [2024-03-29 14:26:53,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:26:57,147][00497] Updated weights for policy 0, policy_version 20879 (0.0019) [2024-03-29 14:26:58,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 342147072. Throughput: 0: 41720.4. Samples: 224374180. Policy #0 lag: (min: 0.0, avg: 16.6, max: 40.0) [2024-03-29 14:26:58,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:27:01,068][00497] Updated weights for policy 0, policy_version 20889 (0.0017) [2024-03-29 14:27:03,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41233.1, 300 sec: 41709.8). Total num frames: 342327296. Throughput: 0: 41334.2. Samples: 224502940. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 14:27:03,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:27:03,979][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020895_342343680.pth... [2024-03-29 14:27:04,437][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020287_332382208.pth [2024-03-29 14:27:05,589][00497] Updated weights for policy 0, policy_version 20899 (0.0023) [2024-03-29 14:27:08,662][00497] Updated weights for policy 0, policy_version 20909 (0.0026) [2024-03-29 14:27:08,839][00126] Fps is (10 sec: 42599.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 342573056. Throughput: 0: 42357.0. Samples: 224754040. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 14:27:08,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:27:12,823][00497] Updated weights for policy 0, policy_version 20919 (0.0025) [2024-03-29 14:27:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 342753280. Throughput: 0: 41964.4. Samples: 224990500. Policy #0 lag: (min: 0.0, avg: 17.3, max: 40.0) [2024-03-29 14:27:13,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 14:27:16,832][00497] Updated weights for policy 0, policy_version 20929 (0.0028) [2024-03-29 14:27:18,839][00126] Fps is (10 sec: 37682.7, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 342949888. Throughput: 0: 41668.4. Samples: 225126340. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:27:18,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:27:19,716][00476] Signal inference workers to stop experience collection... (8050 times) [2024-03-29 14:27:19,794][00497] InferenceWorker_p0-w0: stopping experience collection (8050 times) [2024-03-29 14:27:19,885][00476] Signal inference workers to resume experience collection... (8050 times) [2024-03-29 14:27:19,885][00497] InferenceWorker_p0-w0: resuming experience collection (8050 times) [2024-03-29 14:27:21,230][00497] Updated weights for policy 0, policy_version 20939 (0.0021) [2024-03-29 14:27:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.0, 300 sec: 41709.8). Total num frames: 343195648. Throughput: 0: 42462.2. Samples: 225387600. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:27:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:27:24,347][00497] Updated weights for policy 0, policy_version 20949 (0.0020) [2024-03-29 14:27:28,455][00497] Updated weights for policy 0, policy_version 20959 (0.0026) [2024-03-29 14:27:28,839][00126] Fps is (10 sec: 44237.0, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 343392256. Throughput: 0: 41739.7. Samples: 225613620. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:27:28,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:27:32,634][00497] Updated weights for policy 0, policy_version 20969 (0.0025) [2024-03-29 14:27:33,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.1, 300 sec: 41765.3). Total num frames: 343588864. Throughput: 0: 41745.6. Samples: 225743680. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:27:33,840][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 14:27:36,818][00497] Updated weights for policy 0, policy_version 20979 (0.0022) [2024-03-29 14:27:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 343801856. Throughput: 0: 42113.4. Samples: 226013920. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:27:38,841][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:27:39,931][00497] Updated weights for policy 0, policy_version 20989 (0.0024) [2024-03-29 14:27:43,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 344014848. Throughput: 0: 41291.5. Samples: 226232300. Policy #0 lag: (min: 1.0, avg: 19.9, max: 41.0) [2024-03-29 14:27:43,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:27:44,149][00497] Updated weights for policy 0, policy_version 20999 (0.0021) [2024-03-29 14:27:48,117][00476] Signal inference workers to stop experience collection... (8100 times) [2024-03-29 14:27:48,150][00497] InferenceWorker_p0-w0: stopping experience collection (8100 times) [2024-03-29 14:27:48,326][00476] Signal inference workers to resume experience collection... (8100 times) [2024-03-29 14:27:48,326][00497] InferenceWorker_p0-w0: resuming experience collection (8100 times) [2024-03-29 14:27:48,329][00497] Updated weights for policy 0, policy_version 21009 (0.0029) [2024-03-29 14:27:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 344227840. Throughput: 0: 41570.7. Samples: 226373620. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:27:48,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:27:52,633][00497] Updated weights for policy 0, policy_version 21019 (0.0021) [2024-03-29 14:27:53,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40686.9, 300 sec: 41432.1). Total num frames: 344391680. Throughput: 0: 41698.5. Samples: 226630480. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:27:53,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:27:55,919][00497] Updated weights for policy 0, policy_version 21029 (0.0023) [2024-03-29 14:27:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 344653824. Throughput: 0: 41306.3. Samples: 226849280. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:27:58,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:28:00,020][00497] Updated weights for policy 0, policy_version 21039 (0.0025) [2024-03-29 14:28:03,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 344850432. Throughput: 0: 41578.3. Samples: 226997360. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:28:03,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:28:04,032][00497] Updated weights for policy 0, policy_version 21049 (0.0019) [2024-03-29 14:28:08,190][00497] Updated weights for policy 0, policy_version 21059 (0.0023) [2024-03-29 14:28:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.0, 300 sec: 41543.2). Total num frames: 345047040. Throughput: 0: 41650.3. Samples: 227261860. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:28:08,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:28:11,633][00497] Updated weights for policy 0, policy_version 21069 (0.0022) [2024-03-29 14:28:13,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42325.4, 300 sec: 41709.8). Total num frames: 345292800. Throughput: 0: 41390.6. Samples: 227476200. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:28:13,841][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:28:15,793][00497] Updated weights for policy 0, policy_version 21079 (0.0024) [2024-03-29 14:28:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 345473024. Throughput: 0: 41714.8. Samples: 227620840. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 14:28:18,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:28:19,383][00476] Signal inference workers to stop experience collection... (8150 times) [2024-03-29 14:28:19,435][00497] InferenceWorker_p0-w0: stopping experience collection (8150 times) [2024-03-29 14:28:19,469][00476] Signal inference workers to resume experience collection... (8150 times) [2024-03-29 14:28:19,471][00497] InferenceWorker_p0-w0: resuming experience collection (8150 times) [2024-03-29 14:28:19,740][00497] Updated weights for policy 0, policy_version 21089 (0.0024) [2024-03-29 14:28:23,839][00126] Fps is (10 sec: 37682.8, 60 sec: 41233.1, 300 sec: 41543.1). Total num frames: 345669632. Throughput: 0: 41767.5. Samples: 227893460. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 14:28:23,840][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 14:28:24,012][00497] Updated weights for policy 0, policy_version 21099 (0.0018) [2024-03-29 14:28:27,198][00497] Updated weights for policy 0, policy_version 21109 (0.0026) [2024-03-29 14:28:28,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 345915392. Throughput: 0: 41476.0. Samples: 228098720. Policy #0 lag: (min: 0.0, avg: 20.6, max: 42.0) [2024-03-29 14:28:28,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 14:28:31,608][00497] Updated weights for policy 0, policy_version 21119 (0.0023) [2024-03-29 14:28:33,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 346095616. Throughput: 0: 41607.4. Samples: 228245960. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:28:33,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:28:35,417][00497] Updated weights for policy 0, policy_version 21129 (0.0023) [2024-03-29 14:28:38,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 346292224. Throughput: 0: 41903.2. Samples: 228516120. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:28:38,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:28:39,609][00497] Updated weights for policy 0, policy_version 21139 (0.0019) [2024-03-29 14:28:42,813][00497] Updated weights for policy 0, policy_version 21149 (0.0022) [2024-03-29 14:28:43,839][00126] Fps is (10 sec: 45875.5, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 346554368. Throughput: 0: 41901.2. Samples: 228734840. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:28:43,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:28:47,130][00497] Updated weights for policy 0, policy_version 21159 (0.0022) [2024-03-29 14:28:48,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 346734592. Throughput: 0: 41514.7. Samples: 228865520. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:28:48,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:28:50,972][00497] Updated weights for policy 0, policy_version 21169 (0.0026) [2024-03-29 14:28:53,839][00126] Fps is (10 sec: 37683.4, 60 sec: 42325.4, 300 sec: 41654.2). Total num frames: 346931200. Throughput: 0: 41828.4. Samples: 229144140. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:28:53,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:28:54,021][00476] Signal inference workers to stop experience collection... (8200 times) [2024-03-29 14:28:54,062][00497] InferenceWorker_p0-w0: stopping experience collection (8200 times) [2024-03-29 14:28:54,101][00476] Signal inference workers to resume experience collection... (8200 times) [2024-03-29 14:28:54,102][00497] InferenceWorker_p0-w0: resuming experience collection (8200 times) [2024-03-29 14:28:55,092][00497] Updated weights for policy 0, policy_version 21179 (0.0028) [2024-03-29 14:28:58,463][00497] Updated weights for policy 0, policy_version 21189 (0.0025) [2024-03-29 14:28:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 347176960. Throughput: 0: 42205.8. Samples: 229375460. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:28:58,840][00126] Avg episode reward: [(0, '0.280')] [2024-03-29 14:29:02,688][00497] Updated weights for policy 0, policy_version 21199 (0.0026) [2024-03-29 14:29:03,839][00126] Fps is (10 sec: 42598.1, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 347357184. Throughput: 0: 41746.6. Samples: 229499440. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:29:03,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:29:04,172][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021203_347389952.pth... [2024-03-29 14:29:04,483][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020595_337428480.pth [2024-03-29 14:29:06,633][00497] Updated weights for policy 0, policy_version 21209 (0.0023) [2024-03-29 14:29:08,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41779.2, 300 sec: 41654.3). Total num frames: 347553792. Throughput: 0: 41618.8. Samples: 229766300. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:29:08,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 14:29:10,756][00497] Updated weights for policy 0, policy_version 21219 (0.0024) [2024-03-29 14:29:13,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 347799552. Throughput: 0: 42649.0. Samples: 230017920. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:29:13,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:29:14,028][00497] Updated weights for policy 0, policy_version 21229 (0.0026) [2024-03-29 14:29:18,312][00497] Updated weights for policy 0, policy_version 21239 (0.0027) [2024-03-29 14:29:18,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 347996160. Throughput: 0: 41706.4. Samples: 230122740. Policy #0 lag: (min: 0.0, avg: 23.5, max: 42.0) [2024-03-29 14:29:18,840][00126] Avg episode reward: [(0, '0.453')] [2024-03-29 14:29:22,458][00497] Updated weights for policy 0, policy_version 21249 (0.0022) [2024-03-29 14:29:23,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 348176384. Throughput: 0: 41780.4. Samples: 230396240. Policy #0 lag: (min: 0.0, avg: 23.5, max: 42.0) [2024-03-29 14:29:23,840][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:29:26,450][00497] Updated weights for policy 0, policy_version 21259 (0.0020) [2024-03-29 14:29:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 348422144. Throughput: 0: 42707.7. Samples: 230656680. Policy #0 lag: (min: 0.0, avg: 23.5, max: 42.0) [2024-03-29 14:29:28,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:29:29,057][00476] Signal inference workers to stop experience collection... (8250 times) [2024-03-29 14:29:29,102][00497] InferenceWorker_p0-w0: stopping experience collection (8250 times) [2024-03-29 14:29:29,137][00476] Signal inference workers to resume experience collection... (8250 times) [2024-03-29 14:29:29,140][00497] InferenceWorker_p0-w0: resuming experience collection (8250 times) [2024-03-29 14:29:29,758][00497] Updated weights for policy 0, policy_version 21269 (0.0031) [2024-03-29 14:29:33,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 348618752. Throughput: 0: 41871.0. Samples: 230749720. Policy #0 lag: (min: 0.0, avg: 23.1, max: 40.0) [2024-03-29 14:29:33,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 14:29:34,241][00497] Updated weights for policy 0, policy_version 21279 (0.0021) [2024-03-29 14:29:38,133][00497] Updated weights for policy 0, policy_version 21289 (0.0018) [2024-03-29 14:29:38,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 348815360. Throughput: 0: 41587.5. Samples: 231015580. Policy #0 lag: (min: 0.0, avg: 23.1, max: 40.0) [2024-03-29 14:29:38,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:29:42,164][00497] Updated weights for policy 0, policy_version 21299 (0.0021) [2024-03-29 14:29:43,839][00126] Fps is (10 sec: 39321.4, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 349011968. Throughput: 0: 42096.8. Samples: 231269820. Policy #0 lag: (min: 0.0, avg: 23.1, max: 40.0) [2024-03-29 14:29:43,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:29:45,666][00497] Updated weights for policy 0, policy_version 21309 (0.0027) [2024-03-29 14:29:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 41709.8). Total num frames: 349257728. Throughput: 0: 41391.1. Samples: 231362040. Policy #0 lag: (min: 2.0, avg: 23.4, max: 43.0) [2024-03-29 14:29:48,842][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:29:50,075][00497] Updated weights for policy 0, policy_version 21319 (0.0019) [2024-03-29 14:29:53,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 349421568. Throughput: 0: 41540.8. Samples: 231635640. Policy #0 lag: (min: 2.0, avg: 23.4, max: 43.0) [2024-03-29 14:29:53,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:29:54,160][00497] Updated weights for policy 0, policy_version 21329 (0.0019) [2024-03-29 14:29:58,110][00497] Updated weights for policy 0, policy_version 21339 (0.0034) [2024-03-29 14:29:58,839][00126] Fps is (10 sec: 37683.4, 60 sec: 40960.0, 300 sec: 41432.1). Total num frames: 349634560. Throughput: 0: 41473.7. Samples: 231884240. Policy #0 lag: (min: 2.0, avg: 23.4, max: 43.0) [2024-03-29 14:29:58,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 14:30:01,486][00476] Signal inference workers to stop experience collection... (8300 times) [2024-03-29 14:30:01,525][00497] InferenceWorker_p0-w0: stopping experience collection (8300 times) [2024-03-29 14:30:01,708][00476] Signal inference workers to resume experience collection... (8300 times) [2024-03-29 14:30:01,708][00497] InferenceWorker_p0-w0: resuming experience collection (8300 times) [2024-03-29 14:30:01,713][00497] Updated weights for policy 0, policy_version 21349 (0.0025) [2024-03-29 14:30:03,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 349863936. Throughput: 0: 41477.2. Samples: 231989220. Policy #0 lag: (min: 2.0, avg: 21.5, max: 44.0) [2024-03-29 14:30:03,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:30:06,146][00497] Updated weights for policy 0, policy_version 21359 (0.0022) [2024-03-29 14:30:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 350060544. Throughput: 0: 41268.9. Samples: 232253340. Policy #0 lag: (min: 2.0, avg: 21.5, max: 44.0) [2024-03-29 14:30:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:30:09,979][00497] Updated weights for policy 0, policy_version 21369 (0.0022) [2024-03-29 14:30:13,839][00126] Fps is (10 sec: 39322.3, 60 sec: 40960.0, 300 sec: 41487.6). Total num frames: 350257152. Throughput: 0: 41052.0. Samples: 232504020. Policy #0 lag: (min: 2.0, avg: 21.5, max: 44.0) [2024-03-29 14:30:13,840][00126] Avg episode reward: [(0, '0.477')] [2024-03-29 14:30:13,910][00497] Updated weights for policy 0, policy_version 21379 (0.0024) [2024-03-29 14:30:17,431][00497] Updated weights for policy 0, policy_version 21389 (0.0022) [2024-03-29 14:30:18,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41506.0, 300 sec: 41598.7). Total num frames: 350486528. Throughput: 0: 41565.7. Samples: 232620180. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 14:30:18,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:30:21,939][00497] Updated weights for policy 0, policy_version 21399 (0.0025) [2024-03-29 14:30:23,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 350666752. Throughput: 0: 41293.3. Samples: 232873780. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 14:30:23,842][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:30:25,908][00497] Updated weights for policy 0, policy_version 21409 (0.0032) [2024-03-29 14:30:28,839][00126] Fps is (10 sec: 39321.8, 60 sec: 40959.9, 300 sec: 41543.1). Total num frames: 350879744. Throughput: 0: 41476.5. Samples: 233136260. Policy #0 lag: (min: 2.0, avg: 19.2, max: 42.0) [2024-03-29 14:30:28,840][00126] Avg episode reward: [(0, '0.340')] [2024-03-29 14:30:29,660][00497] Updated weights for policy 0, policy_version 21419 (0.0030) [2024-03-29 14:30:32,834][00476] Signal inference workers to stop experience collection... (8350 times) [2024-03-29 14:30:32,868][00497] InferenceWorker_p0-w0: stopping experience collection (8350 times) [2024-03-29 14:30:33,049][00476] Signal inference workers to resume experience collection... (8350 times) [2024-03-29 14:30:33,050][00497] InferenceWorker_p0-w0: resuming experience collection (8350 times) [2024-03-29 14:30:33,053][00497] Updated weights for policy 0, policy_version 21429 (0.0029) [2024-03-29 14:30:33,839][00126] Fps is (10 sec: 45875.1, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 351125504. Throughput: 0: 42154.2. Samples: 233258980. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:30:33,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:30:37,371][00497] Updated weights for policy 0, policy_version 21439 (0.0023) [2024-03-29 14:30:38,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 351305728. Throughput: 0: 41530.7. Samples: 233504520. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:30:38,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:30:41,560][00497] Updated weights for policy 0, policy_version 21449 (0.0018) [2024-03-29 14:30:43,839][00126] Fps is (10 sec: 37683.6, 60 sec: 41506.2, 300 sec: 41543.2). Total num frames: 351502336. Throughput: 0: 41779.5. Samples: 233764320. Policy #0 lag: (min: 0.0, avg: 20.8, max: 42.0) [2024-03-29 14:30:43,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:30:45,366][00497] Updated weights for policy 0, policy_version 21459 (0.0019) [2024-03-29 14:30:48,816][00497] Updated weights for policy 0, policy_version 21469 (0.0024) [2024-03-29 14:30:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 351748096. Throughput: 0: 42125.0. Samples: 233884840. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:30:48,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:30:53,044][00497] Updated weights for policy 0, policy_version 21479 (0.0020) [2024-03-29 14:30:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 351928320. Throughput: 0: 41312.8. Samples: 234112420. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:30:53,840][00126] Avg episode reward: [(0, '0.322')] [2024-03-29 14:30:57,263][00497] Updated weights for policy 0, policy_version 21489 (0.0018) [2024-03-29 14:30:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 352124928. Throughput: 0: 41860.8. Samples: 234387760. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:30:58,841][00126] Avg episode reward: [(0, '0.444')] [2024-03-29 14:31:01,052][00497] Updated weights for policy 0, policy_version 21499 (0.0024) [2024-03-29 14:31:03,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 352354304. Throughput: 0: 42032.5. Samples: 234511640. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:31:03,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 14:31:04,026][00476] Signal inference workers to stop experience collection... (8400 times) [2024-03-29 14:31:04,076][00497] InferenceWorker_p0-w0: stopping experience collection (8400 times) [2024-03-29 14:31:04,209][00476] Signal inference workers to resume experience collection... (8400 times) [2024-03-29 14:31:04,210][00497] InferenceWorker_p0-w0: resuming experience collection (8400 times) [2024-03-29 14:31:04,211][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021508_352387072.pth... [2024-03-29 14:31:04,520][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000020895_342343680.pth [2024-03-29 14:31:04,787][00497] Updated weights for policy 0, policy_version 21509 (0.0040) [2024-03-29 14:31:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 352550912. Throughput: 0: 41303.6. Samples: 234732440. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:31:08,840][00126] Avg episode reward: [(0, '0.310')] [2024-03-29 14:31:09,008][00497] Updated weights for policy 0, policy_version 21519 (0.0020) [2024-03-29 14:31:13,357][00497] Updated weights for policy 0, policy_version 21529 (0.0020) [2024-03-29 14:31:13,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 352747520. Throughput: 0: 41618.7. Samples: 235009100. Policy #0 lag: (min: 1.0, avg: 20.8, max: 42.0) [2024-03-29 14:31:13,840][00126] Avg episode reward: [(0, '0.280')] [2024-03-29 14:31:17,009][00497] Updated weights for policy 0, policy_version 21539 (0.0028) [2024-03-29 14:31:18,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41598.7). Total num frames: 352976896. Throughput: 0: 41255.3. Samples: 235115460. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 14:31:18,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:31:20,317][00497] Updated weights for policy 0, policy_version 21549 (0.0025) [2024-03-29 14:31:23,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 353189888. Throughput: 0: 41035.0. Samples: 235351100. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 14:31:23,841][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:31:24,680][00497] Updated weights for policy 0, policy_version 21559 (0.0021) [2024-03-29 14:31:28,839][00126] Fps is (10 sec: 37682.9, 60 sec: 41233.1, 300 sec: 41654.2). Total num frames: 353353728. Throughput: 0: 41361.3. Samples: 235625580. Policy #0 lag: (min: 1.0, avg: 19.4, max: 41.0) [2024-03-29 14:31:28,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:31:29,337][00497] Updated weights for policy 0, policy_version 21569 (0.0029) [2024-03-29 14:31:32,793][00497] Updated weights for policy 0, policy_version 21579 (0.0026) [2024-03-29 14:31:33,839][00126] Fps is (10 sec: 39322.1, 60 sec: 40960.1, 300 sec: 41543.2). Total num frames: 353583104. Throughput: 0: 41250.7. Samples: 235741120. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 14:31:33,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:31:36,187][00497] Updated weights for policy 0, policy_version 21589 (0.0024) [2024-03-29 14:31:36,522][00476] Signal inference workers to stop experience collection... (8450 times) [2024-03-29 14:31:36,559][00497] InferenceWorker_p0-w0: stopping experience collection (8450 times) [2024-03-29 14:31:36,738][00476] Signal inference workers to resume experience collection... (8450 times) [2024-03-29 14:31:36,738][00497] InferenceWorker_p0-w0: resuming experience collection (8450 times) [2024-03-29 14:31:38,839][00126] Fps is (10 sec: 45875.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 353812480. Throughput: 0: 41245.3. Samples: 235968460. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 14:31:38,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:31:40,489][00497] Updated weights for policy 0, policy_version 21599 (0.0031) [2024-03-29 14:31:43,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 353992704. Throughput: 0: 41236.8. Samples: 236243420. Policy #0 lag: (min: 0.0, avg: 18.7, max: 42.0) [2024-03-29 14:31:43,841][00126] Avg episode reward: [(0, '0.459')] [2024-03-29 14:31:45,131][00497] Updated weights for policy 0, policy_version 21609 (0.0019) [2024-03-29 14:31:48,716][00497] Updated weights for policy 0, policy_version 21619 (0.0030) [2024-03-29 14:31:48,839][00126] Fps is (10 sec: 39322.1, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 354205696. Throughput: 0: 41526.7. Samples: 236380340. Policy #0 lag: (min: 1.0, avg: 19.1, max: 43.0) [2024-03-29 14:31:48,840][00126] Avg episode reward: [(0, '0.465')] [2024-03-29 14:31:52,100][00497] Updated weights for policy 0, policy_version 21629 (0.0027) [2024-03-29 14:31:53,839][00126] Fps is (10 sec: 44237.3, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 354435072. Throughput: 0: 41678.2. Samples: 236607960. Policy #0 lag: (min: 1.0, avg: 19.1, max: 43.0) [2024-03-29 14:31:53,840][00126] Avg episode reward: [(0, '0.469')] [2024-03-29 14:31:56,390][00497] Updated weights for policy 0, policy_version 21639 (0.0028) [2024-03-29 14:31:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.2, 300 sec: 41654.2). Total num frames: 354615296. Throughput: 0: 41347.6. Samples: 236869740. Policy #0 lag: (min: 1.0, avg: 19.1, max: 43.0) [2024-03-29 14:31:58,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:32:00,845][00497] Updated weights for policy 0, policy_version 21649 (0.0027) [2024-03-29 14:32:03,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 354844672. Throughput: 0: 42018.1. Samples: 237006280. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 14:32:03,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:32:04,317][00497] Updated weights for policy 0, policy_version 21659 (0.0028) [2024-03-29 14:32:07,572][00497] Updated weights for policy 0, policy_version 21669 (0.0022) [2024-03-29 14:32:08,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 355090432. Throughput: 0: 42189.9. Samples: 237249640. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 14:32:08,840][00126] Avg episode reward: [(0, '0.366')] [2024-03-29 14:32:11,821][00476] Signal inference workers to stop experience collection... (8500 times) [2024-03-29 14:32:11,878][00497] InferenceWorker_p0-w0: stopping experience collection (8500 times) [2024-03-29 14:32:11,985][00476] Signal inference workers to resume experience collection... (8500 times) [2024-03-29 14:32:11,986][00497] InferenceWorker_p0-w0: resuming experience collection (8500 times) [2024-03-29 14:32:11,989][00497] Updated weights for policy 0, policy_version 21679 (0.0028) [2024-03-29 14:32:13,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 355237888. Throughput: 0: 41600.1. Samples: 237497580. Policy #0 lag: (min: 1.0, avg: 18.9, max: 42.0) [2024-03-29 14:32:13,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 14:32:16,471][00497] Updated weights for policy 0, policy_version 21689 (0.0022) [2024-03-29 14:32:18,839][00126] Fps is (10 sec: 37683.2, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 355467264. Throughput: 0: 42201.8. Samples: 237640200. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 14:32:18,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 14:32:19,802][00497] Updated weights for policy 0, policy_version 21699 (0.0021) [2024-03-29 14:32:23,304][00497] Updated weights for policy 0, policy_version 21709 (0.0020) [2024-03-29 14:32:23,839][00126] Fps is (10 sec: 45875.4, 60 sec: 41779.3, 300 sec: 41709.8). Total num frames: 355696640. Throughput: 0: 42233.5. Samples: 237868960. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 14:32:23,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:32:27,687][00497] Updated weights for policy 0, policy_version 21719 (0.0022) [2024-03-29 14:32:28,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 355893248. Throughput: 0: 41728.9. Samples: 238121220. Policy #0 lag: (min: 0.0, avg: 18.1, max: 41.0) [2024-03-29 14:32:28,840][00126] Avg episode reward: [(0, '0.289')] [2024-03-29 14:32:32,179][00497] Updated weights for policy 0, policy_version 21729 (0.0029) [2024-03-29 14:32:33,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.1, 300 sec: 41654.2). Total num frames: 356089856. Throughput: 0: 41839.0. Samples: 238263100. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 14:32:33,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:32:35,744][00497] Updated weights for policy 0, policy_version 21739 (0.0021) [2024-03-29 14:32:38,839][00126] Fps is (10 sec: 40960.8, 60 sec: 41506.3, 300 sec: 41654.3). Total num frames: 356302848. Throughput: 0: 41921.4. Samples: 238494420. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 14:32:38,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:32:39,140][00497] Updated weights for policy 0, policy_version 21749 (0.0024) [2024-03-29 14:32:43,613][00497] Updated weights for policy 0, policy_version 21759 (0.0025) [2024-03-29 14:32:43,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.3, 300 sec: 41598.7). Total num frames: 356499456. Throughput: 0: 41279.1. Samples: 238727300. Policy #0 lag: (min: 0.0, avg: 19.0, max: 42.0) [2024-03-29 14:32:43,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 14:32:47,717][00476] Signal inference workers to stop experience collection... (8550 times) [2024-03-29 14:32:47,760][00497] InferenceWorker_p0-w0: stopping experience collection (8550 times) [2024-03-29 14:32:47,797][00476] Signal inference workers to resume experience collection... (8550 times) [2024-03-29 14:32:47,799][00497] InferenceWorker_p0-w0: resuming experience collection (8550 times) [2024-03-29 14:32:48,049][00497] Updated weights for policy 0, policy_version 21769 (0.0026) [2024-03-29 14:32:48,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 356696064. Throughput: 0: 41401.0. Samples: 238869320. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 14:32:48,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:32:51,602][00497] Updated weights for policy 0, policy_version 21779 (0.0029) [2024-03-29 14:32:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 356925440. Throughput: 0: 41444.4. Samples: 239114640. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 14:32:53,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:32:54,961][00497] Updated weights for policy 0, policy_version 21789 (0.0029) [2024-03-29 14:32:58,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 357138432. Throughput: 0: 41255.9. Samples: 239354100. Policy #0 lag: (min: 1.0, avg: 19.8, max: 42.0) [2024-03-29 14:32:58,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 14:32:59,257][00497] Updated weights for policy 0, policy_version 21799 (0.0027) [2024-03-29 14:33:03,759][00497] Updated weights for policy 0, policy_version 21809 (0.0026) [2024-03-29 14:33:03,840][00126] Fps is (10 sec: 39318.3, 60 sec: 41232.5, 300 sec: 41598.6). Total num frames: 357318656. Throughput: 0: 41188.5. Samples: 239493720. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:33:03,841][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:33:04,282][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021811_357351424.pth... [2024-03-29 14:33:04,616][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021203_347389952.pth [2024-03-29 14:33:07,452][00497] Updated weights for policy 0, policy_version 21819 (0.0019) [2024-03-29 14:33:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 40960.0, 300 sec: 41543.2). Total num frames: 357548032. Throughput: 0: 41400.4. Samples: 239731980. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:33:08,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:33:10,734][00497] Updated weights for policy 0, policy_version 21829 (0.0021) [2024-03-29 14:33:13,839][00126] Fps is (10 sec: 45879.1, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 357777408. Throughput: 0: 41182.3. Samples: 239974420. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:33:13,842][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 14:33:15,061][00497] Updated weights for policy 0, policy_version 21839 (0.0021) [2024-03-29 14:33:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 357941248. Throughput: 0: 41110.7. Samples: 240113080. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:33:18,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:33:19,570][00497] Updated weights for policy 0, policy_version 21849 (0.0025) [2024-03-29 14:33:21,167][00476] Signal inference workers to stop experience collection... (8600 times) [2024-03-29 14:33:21,169][00476] Signal inference workers to resume experience collection... (8600 times) [2024-03-29 14:33:21,214][00497] InferenceWorker_p0-w0: stopping experience collection (8600 times) [2024-03-29 14:33:21,214][00497] InferenceWorker_p0-w0: resuming experience collection (8600 times) [2024-03-29 14:33:22,715][00497] Updated weights for policy 0, policy_version 21859 (0.0022) [2024-03-29 14:33:23,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.1, 300 sec: 41598.7). Total num frames: 358187008. Throughput: 0: 41619.5. Samples: 240367300. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:33:23,840][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:33:26,358][00497] Updated weights for policy 0, policy_version 21869 (0.0024) [2024-03-29 14:33:28,839][00126] Fps is (10 sec: 47513.7, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 358416384. Throughput: 0: 41694.6. Samples: 240603560. Policy #0 lag: (min: 0.0, avg: 20.7, max: 41.0) [2024-03-29 14:33:28,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:33:30,598][00497] Updated weights for policy 0, policy_version 21879 (0.0030) [2024-03-29 14:33:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 358563840. Throughput: 0: 41710.7. Samples: 240746300. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:33:33,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 14:33:35,095][00497] Updated weights for policy 0, policy_version 21889 (0.0017) [2024-03-29 14:33:38,262][00497] Updated weights for policy 0, policy_version 21899 (0.0025) [2024-03-29 14:33:38,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 358809600. Throughput: 0: 41857.9. Samples: 240998240. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:33:38,840][00126] Avg episode reward: [(0, '0.396')] [2024-03-29 14:33:41,998][00497] Updated weights for policy 0, policy_version 21909 (0.0031) [2024-03-29 14:33:43,839][00126] Fps is (10 sec: 47512.9, 60 sec: 42325.2, 300 sec: 41709.8). Total num frames: 359038976. Throughput: 0: 41728.0. Samples: 241231860. Policy #0 lag: (min: 0.0, avg: 21.0, max: 41.0) [2024-03-29 14:33:43,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:33:46,161][00497] Updated weights for policy 0, policy_version 21919 (0.0023) [2024-03-29 14:33:48,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 359219200. Throughput: 0: 41841.3. Samples: 241376540. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 14:33:48,841][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:33:50,745][00497] Updated weights for policy 0, policy_version 21929 (0.0019) [2024-03-29 14:33:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41543.1). Total num frames: 359432192. Throughput: 0: 42141.7. Samples: 241628360. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 14:33:53,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:33:54,131][00476] Signal inference workers to stop experience collection... (8650 times) [2024-03-29 14:33:54,132][00476] Signal inference workers to resume experience collection... (8650 times) [2024-03-29 14:33:54,152][00497] Updated weights for policy 0, policy_version 21939 (0.0024) [2024-03-29 14:33:54,175][00497] InferenceWorker_p0-w0: stopping experience collection (8650 times) [2024-03-29 14:33:54,176][00497] InferenceWorker_p0-w0: resuming experience collection (8650 times) [2024-03-29 14:33:57,651][00497] Updated weights for policy 0, policy_version 21949 (0.0034) [2024-03-29 14:33:58,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 359661568. Throughput: 0: 41950.7. Samples: 241862200. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 14:33:58,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:34:01,590][00497] Updated weights for policy 0, policy_version 21959 (0.0029) [2024-03-29 14:34:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 42052.8, 300 sec: 41654.2). Total num frames: 359841792. Throughput: 0: 41810.1. Samples: 241994540. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 14:34:03,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 14:34:06,483][00497] Updated weights for policy 0, policy_version 21969 (0.0027) [2024-03-29 14:34:08,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 41543.2). Total num frames: 360054784. Throughput: 0: 42097.4. Samples: 242261680. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 14:34:08,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:34:09,763][00497] Updated weights for policy 0, policy_version 21979 (0.0023) [2024-03-29 14:34:13,431][00497] Updated weights for policy 0, policy_version 21989 (0.0023) [2024-03-29 14:34:13,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.3, 300 sec: 41654.2). Total num frames: 360284160. Throughput: 0: 42125.8. Samples: 242499220. Policy #0 lag: (min: 0.0, avg: 21.5, max: 42.0) [2024-03-29 14:34:13,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:34:17,519][00497] Updated weights for policy 0, policy_version 21999 (0.0021) [2024-03-29 14:34:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 41709.8). Total num frames: 360480768. Throughput: 0: 41287.5. Samples: 242604240. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:34:18,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:34:22,333][00497] Updated weights for policy 0, policy_version 22009 (0.0019) [2024-03-29 14:34:23,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41543.1). Total num frames: 360677376. Throughput: 0: 42155.0. Samples: 242895220. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:34:23,843][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:34:25,416][00497] Updated weights for policy 0, policy_version 22019 (0.0018) [2024-03-29 14:34:26,643][00476] Signal inference workers to stop experience collection... (8700 times) [2024-03-29 14:34:26,671][00497] InferenceWorker_p0-w0: stopping experience collection (8700 times) [2024-03-29 14:34:26,835][00476] Signal inference workers to resume experience collection... (8700 times) [2024-03-29 14:34:26,836][00497] InferenceWorker_p0-w0: resuming experience collection (8700 times) [2024-03-29 14:34:28,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 41654.2). Total num frames: 360906752. Throughput: 0: 42211.1. Samples: 243131360. Policy #0 lag: (min: 0.0, avg: 21.3, max: 41.0) [2024-03-29 14:34:28,840][00126] Avg episode reward: [(0, '0.412')] [2024-03-29 14:34:28,990][00497] Updated weights for policy 0, policy_version 22029 (0.0032) [2024-03-29 14:34:33,075][00497] Updated weights for policy 0, policy_version 22039 (0.0019) [2024-03-29 14:34:33,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.3, 300 sec: 41709.8). Total num frames: 361119744. Throughput: 0: 41503.5. Samples: 243244200. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 14:34:33,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:34:37,945][00497] Updated weights for policy 0, policy_version 22049 (0.0023) [2024-03-29 14:34:38,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 41654.3). Total num frames: 361299968. Throughput: 0: 42054.3. Samples: 243520800. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 14:34:38,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 14:34:41,200][00497] Updated weights for policy 0, policy_version 22059 (0.0031) [2024-03-29 14:34:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41543.2). Total num frames: 361512960. Throughput: 0: 41918.7. Samples: 243748540. Policy #0 lag: (min: 0.0, avg: 22.2, max: 41.0) [2024-03-29 14:34:43,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 14:34:44,800][00497] Updated weights for policy 0, policy_version 22069 (0.0026) [2024-03-29 14:34:48,824][00497] Updated weights for policy 0, policy_version 22079 (0.0019) [2024-03-29 14:34:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 361742336. Throughput: 0: 41604.6. Samples: 243866740. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:34:48,840][00126] Avg episode reward: [(0, '0.297')] [2024-03-29 14:34:53,720][00497] Updated weights for policy 0, policy_version 22089 (0.0020) [2024-03-29 14:34:53,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41598.7). Total num frames: 361906176. Throughput: 0: 41450.7. Samples: 244126960. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:34:53,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:34:57,024][00497] Updated weights for policy 0, policy_version 22099 (0.0020) [2024-03-29 14:34:58,575][00476] Signal inference workers to stop experience collection... (8750 times) [2024-03-29 14:34:58,626][00497] InferenceWorker_p0-w0: stopping experience collection (8750 times) [2024-03-29 14:34:58,738][00476] Signal inference workers to resume experience collection... (8750 times) [2024-03-29 14:34:58,739][00497] InferenceWorker_p0-w0: resuming experience collection (8750 times) [2024-03-29 14:34:58,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 362135552. Throughput: 0: 41627.0. Samples: 244372440. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:34:58,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:35:00,653][00497] Updated weights for policy 0, policy_version 22109 (0.0018) [2024-03-29 14:35:03,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 41709.8). Total num frames: 362364928. Throughput: 0: 41884.5. Samples: 244489040. Policy #0 lag: (min: 0.0, avg: 23.1, max: 41.0) [2024-03-29 14:35:03,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:35:03,905][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022118_362381312.pth... [2024-03-29 14:35:04,238][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021508_352387072.pth [2024-03-29 14:35:04,523][00497] Updated weights for policy 0, policy_version 22119 (0.0031) [2024-03-29 14:35:08,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41233.0, 300 sec: 41598.7). Total num frames: 362528768. Throughput: 0: 41259.6. Samples: 244751900. Policy #0 lag: (min: 0.0, avg: 23.4, max: 44.0) [2024-03-29 14:35:08,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 14:35:09,402][00497] Updated weights for policy 0, policy_version 22129 (0.0027) [2024-03-29 14:35:12,502][00497] Updated weights for policy 0, policy_version 22139 (0.0021) [2024-03-29 14:35:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 41506.0, 300 sec: 41654.2). Total num frames: 362774528. Throughput: 0: 41433.3. Samples: 244995860. Policy #0 lag: (min: 0.0, avg: 23.4, max: 44.0) [2024-03-29 14:35:13,840][00126] Avg episode reward: [(0, '0.312')] [2024-03-29 14:35:16,295][00497] Updated weights for policy 0, policy_version 22149 (0.0039) [2024-03-29 14:35:18,839][00126] Fps is (10 sec: 49151.8, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 363020288. Throughput: 0: 41801.4. Samples: 245125260. Policy #0 lag: (min: 0.0, avg: 23.4, max: 44.0) [2024-03-29 14:35:18,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:35:20,202][00497] Updated weights for policy 0, policy_version 22159 (0.0026) [2024-03-29 14:35:23,839][00126] Fps is (10 sec: 39322.5, 60 sec: 41506.2, 300 sec: 41654.3). Total num frames: 363167744. Throughput: 0: 41305.4. Samples: 245379540. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 14:35:23,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 14:35:24,809][00497] Updated weights for policy 0, policy_version 22169 (0.0018) [2024-03-29 14:35:28,004][00497] Updated weights for policy 0, policy_version 22179 (0.0022) [2024-03-29 14:35:28,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41654.2). Total num frames: 363413504. Throughput: 0: 41932.8. Samples: 245635520. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 14:35:28,840][00126] Avg episode reward: [(0, '0.466')] [2024-03-29 14:35:31,698][00497] Updated weights for policy 0, policy_version 22189 (0.0019) [2024-03-29 14:35:33,361][00476] Signal inference workers to stop experience collection... (8800 times) [2024-03-29 14:35:33,417][00497] InferenceWorker_p0-w0: stopping experience collection (8800 times) [2024-03-29 14:35:33,452][00476] Signal inference workers to resume experience collection... (8800 times) [2024-03-29 14:35:33,455][00497] InferenceWorker_p0-w0: resuming experience collection (8800 times) [2024-03-29 14:35:33,839][00126] Fps is (10 sec: 47513.5, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 363642880. Throughput: 0: 42126.7. Samples: 245762440. Policy #0 lag: (min: 0.0, avg: 23.2, max: 43.0) [2024-03-29 14:35:33,840][00126] Avg episode reward: [(0, '0.486')] [2024-03-29 14:35:35,563][00497] Updated weights for policy 0, policy_version 22199 (0.0033) [2024-03-29 14:35:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.1, 300 sec: 41709.8). Total num frames: 363806720. Throughput: 0: 41918.6. Samples: 246013300. Policy #0 lag: (min: 0.0, avg: 22.7, max: 40.0) [2024-03-29 14:35:38,840][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:35:40,335][00497] Updated weights for policy 0, policy_version 22209 (0.0019) [2024-03-29 14:35:43,551][00497] Updated weights for policy 0, policy_version 22219 (0.0021) [2024-03-29 14:35:43,839][00126] Fps is (10 sec: 39320.9, 60 sec: 42052.2, 300 sec: 41654.2). Total num frames: 364036096. Throughput: 0: 42059.1. Samples: 246265100. Policy #0 lag: (min: 0.0, avg: 22.7, max: 40.0) [2024-03-29 14:35:43,840][00126] Avg episode reward: [(0, '0.336')] [2024-03-29 14:35:47,423][00497] Updated weights for policy 0, policy_version 22229 (0.0025) [2024-03-29 14:35:48,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 364265472. Throughput: 0: 42184.0. Samples: 246387320. Policy #0 lag: (min: 0.0, avg: 22.7, max: 40.0) [2024-03-29 14:35:48,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:35:51,031][00497] Updated weights for policy 0, policy_version 22239 (0.0024) [2024-03-29 14:35:53,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 364445696. Throughput: 0: 42191.6. Samples: 246650520. Policy #0 lag: (min: 0.0, avg: 22.6, max: 40.0) [2024-03-29 14:35:53,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:35:55,880][00497] Updated weights for policy 0, policy_version 22249 (0.0019) [2024-03-29 14:35:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42325.4, 300 sec: 41765.3). Total num frames: 364675072. Throughput: 0: 42385.9. Samples: 246903220. Policy #0 lag: (min: 0.0, avg: 22.6, max: 40.0) [2024-03-29 14:35:58,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:35:59,101][00497] Updated weights for policy 0, policy_version 22259 (0.0020) [2024-03-29 14:36:02,878][00497] Updated weights for policy 0, policy_version 22269 (0.0022) [2024-03-29 14:36:03,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41820.8). Total num frames: 364888064. Throughput: 0: 42320.9. Samples: 247029700. Policy #0 lag: (min: 0.0, avg: 22.6, max: 40.0) [2024-03-29 14:36:03,840][00126] Avg episode reward: [(0, '0.345')] [2024-03-29 14:36:06,703][00497] Updated weights for policy 0, policy_version 22279 (0.0019) [2024-03-29 14:36:08,841][00126] Fps is (10 sec: 40954.1, 60 sec: 42597.4, 300 sec: 41820.7). Total num frames: 365084672. Throughput: 0: 42153.3. Samples: 247276500. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:36:08,841][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:36:11,398][00497] Updated weights for policy 0, policy_version 22289 (0.0024) [2024-03-29 14:36:13,558][00476] Signal inference workers to stop experience collection... (8850 times) [2024-03-29 14:36:13,598][00497] InferenceWorker_p0-w0: stopping experience collection (8850 times) [2024-03-29 14:36:13,783][00476] Signal inference workers to resume experience collection... (8850 times) [2024-03-29 14:36:13,784][00497] InferenceWorker_p0-w0: resuming experience collection (8850 times) [2024-03-29 14:36:13,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 365297664. Throughput: 0: 42117.8. Samples: 247530820. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:36:13,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:36:14,847][00497] Updated weights for policy 0, policy_version 22299 (0.0025) [2024-03-29 14:36:18,676][00497] Updated weights for policy 0, policy_version 22309 (0.0020) [2024-03-29 14:36:18,839][00126] Fps is (10 sec: 42604.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 365510656. Throughput: 0: 41810.1. Samples: 247643900. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:36:18,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:36:22,472][00497] Updated weights for policy 0, policy_version 22319 (0.0017) [2024-03-29 14:36:23,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.2, 300 sec: 41931.9). Total num frames: 365723648. Throughput: 0: 41666.1. Samples: 247888280. Policy #0 lag: (min: 0.0, avg: 23.6, max: 42.0) [2024-03-29 14:36:23,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:36:27,299][00497] Updated weights for policy 0, policy_version 22329 (0.0027) [2024-03-29 14:36:28,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 365903872. Throughput: 0: 42131.7. Samples: 248161020. Policy #0 lag: (min: 0.0, avg: 23.6, max: 42.0) [2024-03-29 14:36:28,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:36:30,438][00497] Updated weights for policy 0, policy_version 22339 (0.0020) [2024-03-29 14:36:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 366133248. Throughput: 0: 41781.6. Samples: 248267500. Policy #0 lag: (min: 0.0, avg: 23.6, max: 42.0) [2024-03-29 14:36:33,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:36:34,343][00497] Updated weights for policy 0, policy_version 22349 (0.0020) [2024-03-29 14:36:38,335][00497] Updated weights for policy 0, policy_version 22359 (0.0033) [2024-03-29 14:36:38,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 366346240. Throughput: 0: 41311.0. Samples: 248509520. Policy #0 lag: (min: 0.0, avg: 23.5, max: 43.0) [2024-03-29 14:36:38,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:36:42,725][00497] Updated weights for policy 0, policy_version 22369 (0.0029) [2024-03-29 14:36:43,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 366526464. Throughput: 0: 42002.6. Samples: 248793340. Policy #0 lag: (min: 0.0, avg: 23.5, max: 43.0) [2024-03-29 14:36:43,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 14:36:45,993][00497] Updated weights for policy 0, policy_version 22379 (0.0029) [2024-03-29 14:36:46,325][00476] Signal inference workers to stop experience collection... (8900 times) [2024-03-29 14:36:46,363][00497] InferenceWorker_p0-w0: stopping experience collection (8900 times) [2024-03-29 14:36:46,545][00476] Signal inference workers to resume experience collection... (8900 times) [2024-03-29 14:36:46,546][00497] InferenceWorker_p0-w0: resuming experience collection (8900 times) [2024-03-29 14:36:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 366755840. Throughput: 0: 41689.3. Samples: 248905720. Policy #0 lag: (min: 0.0, avg: 23.5, max: 43.0) [2024-03-29 14:36:48,841][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:36:49,943][00497] Updated weights for policy 0, policy_version 22389 (0.0029) [2024-03-29 14:36:53,809][00497] Updated weights for policy 0, policy_version 22399 (0.0028) [2024-03-29 14:36:53,839][00126] Fps is (10 sec: 45875.3, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 366985216. Throughput: 0: 41756.4. Samples: 249155480. Policy #0 lag: (min: 0.0, avg: 23.2, max: 42.0) [2024-03-29 14:36:53,840][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 14:36:57,992][00497] Updated weights for policy 0, policy_version 22409 (0.0025) [2024-03-29 14:36:58,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 367181824. Throughput: 0: 42223.7. Samples: 249430880. Policy #0 lag: (min: 0.0, avg: 23.2, max: 42.0) [2024-03-29 14:36:58,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:37:01,504][00497] Updated weights for policy 0, policy_version 22419 (0.0028) [2024-03-29 14:37:03,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 367394816. Throughput: 0: 42269.8. Samples: 249546040. Policy #0 lag: (min: 0.0, avg: 23.2, max: 42.0) [2024-03-29 14:37:03,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:37:04,021][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022425_367411200.pth... [2024-03-29 14:37:04,357][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000021811_357351424.pth [2024-03-29 14:37:05,595][00497] Updated weights for policy 0, policy_version 22429 (0.0020) [2024-03-29 14:37:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42326.3, 300 sec: 41987.5). Total num frames: 367624192. Throughput: 0: 42254.8. Samples: 249789740. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:08,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:37:09,252][00497] Updated weights for policy 0, policy_version 22439 (0.0032) [2024-03-29 14:37:13,648][00497] Updated weights for policy 0, policy_version 22449 (0.0020) [2024-03-29 14:37:13,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 367804416. Throughput: 0: 42241.6. Samples: 250061900. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:13,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:37:16,939][00497] Updated weights for policy 0, policy_version 22459 (0.0027) [2024-03-29 14:37:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 368033792. Throughput: 0: 42429.0. Samples: 250176800. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:18,841][00126] Avg episode reward: [(0, '0.285')] [2024-03-29 14:37:20,456][00476] Signal inference workers to stop experience collection... (8950 times) [2024-03-29 14:37:20,478][00497] InferenceWorker_p0-w0: stopping experience collection (8950 times) [2024-03-29 14:37:20,678][00476] Signal inference workers to resume experience collection... (8950 times) [2024-03-29 14:37:20,679][00497] InferenceWorker_p0-w0: resuming experience collection (8950 times) [2024-03-29 14:37:20,971][00497] Updated weights for policy 0, policy_version 22469 (0.0022) [2024-03-29 14:37:23,839][00126] Fps is (10 sec: 45875.2, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 368263168. Throughput: 0: 42719.1. Samples: 250431880. Policy #0 lag: (min: 0.0, avg: 22.7, max: 42.0) [2024-03-29 14:37:23,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:37:24,631][00497] Updated weights for policy 0, policy_version 22479 (0.0027) [2024-03-29 14:37:28,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41820.9). Total num frames: 368427008. Throughput: 0: 42269.7. Samples: 250695480. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 14:37:28,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:37:29,240][00497] Updated weights for policy 0, policy_version 22489 (0.0026) [2024-03-29 14:37:32,529][00497] Updated weights for policy 0, policy_version 22499 (0.0026) [2024-03-29 14:37:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 368672768. Throughput: 0: 42412.5. Samples: 250814280. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 14:37:33,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:37:36,706][00497] Updated weights for policy 0, policy_version 22509 (0.0024) [2024-03-29 14:37:38,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 368885760. Throughput: 0: 42281.8. Samples: 251058160. Policy #0 lag: (min: 0.0, avg: 22.7, max: 41.0) [2024-03-29 14:37:38,840][00126] Avg episode reward: [(0, '0.440')] [2024-03-29 14:37:40,005][00497] Updated weights for policy 0, policy_version 22519 (0.0022) [2024-03-29 14:37:43,839][00126] Fps is (10 sec: 37683.1, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 369049600. Throughput: 0: 42036.5. Samples: 251322520. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:37:43,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:37:44,928][00497] Updated weights for policy 0, policy_version 22529 (0.0028) [2024-03-29 14:37:48,348][00497] Updated weights for policy 0, policy_version 22539 (0.0029) [2024-03-29 14:37:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 369311744. Throughput: 0: 42182.2. Samples: 251444240. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:37:48,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:37:52,333][00497] Updated weights for policy 0, policy_version 22549 (0.0027) [2024-03-29 14:37:53,816][00476] Signal inference workers to stop experience collection... (9000 times) [2024-03-29 14:37:53,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 369508352. Throughput: 0: 42396.9. Samples: 251697600. Policy #0 lag: (min: 0.0, avg: 22.8, max: 43.0) [2024-03-29 14:37:53,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 14:37:53,861][00497] InferenceWorker_p0-w0: stopping experience collection (9000 times) [2024-03-29 14:37:54,014][00476] Signal inference workers to resume experience collection... (9000 times) [2024-03-29 14:37:54,015][00497] InferenceWorker_p0-w0: resuming experience collection (9000 times) [2024-03-29 14:37:55,739][00497] Updated weights for policy 0, policy_version 22559 (0.0024) [2024-03-29 14:37:58,839][00126] Fps is (10 sec: 37683.1, 60 sec: 41779.2, 300 sec: 41932.1). Total num frames: 369688576. Throughput: 0: 41761.4. Samples: 251941160. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:37:58,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 14:38:00,447][00497] Updated weights for policy 0, policy_version 22569 (0.0029) [2024-03-29 14:38:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 369917952. Throughput: 0: 42160.8. Samples: 252074040. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:38:03,840][00126] Avg episode reward: [(0, '0.427')] [2024-03-29 14:38:03,933][00497] Updated weights for policy 0, policy_version 22579 (0.0033) [2024-03-29 14:38:08,021][00497] Updated weights for policy 0, policy_version 22589 (0.0028) [2024-03-29 14:38:08,839][00126] Fps is (10 sec: 44236.9, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 370130944. Throughput: 0: 42178.3. Samples: 252329900. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:38:08,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:38:11,238][00497] Updated weights for policy 0, policy_version 22599 (0.0024) [2024-03-29 14:38:13,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 370327552. Throughput: 0: 41653.0. Samples: 252569860. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 14:38:13,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:38:16,145][00497] Updated weights for policy 0, policy_version 22609 (0.0032) [2024-03-29 14:38:18,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 370556928. Throughput: 0: 41997.7. Samples: 252704180. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 14:38:18,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:38:19,673][00497] Updated weights for policy 0, policy_version 22619 (0.0024) [2024-03-29 14:38:23,791][00497] Updated weights for policy 0, policy_version 22629 (0.0025) [2024-03-29 14:38:23,839][00126] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 370753536. Throughput: 0: 41905.7. Samples: 252943920. Policy #0 lag: (min: 1.0, avg: 22.4, max: 42.0) [2024-03-29 14:38:23,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 14:38:23,889][00476] Signal inference workers to stop experience collection... (9050 times) [2024-03-29 14:38:23,955][00497] InferenceWorker_p0-w0: stopping experience collection (9050 times) [2024-03-29 14:38:24,057][00476] Signal inference workers to resume experience collection... (9050 times) [2024-03-29 14:38:24,058][00497] InferenceWorker_p0-w0: resuming experience collection (9050 times) [2024-03-29 14:38:26,897][00497] Updated weights for policy 0, policy_version 22639 (0.0023) [2024-03-29 14:38:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 370966528. Throughput: 0: 41644.8. Samples: 253196540. Policy #0 lag: (min: 0.0, avg: 23.0, max: 44.0) [2024-03-29 14:38:28,840][00126] Avg episode reward: [(0, '0.452')] [2024-03-29 14:38:31,728][00497] Updated weights for policy 0, policy_version 22649 (0.0019) [2024-03-29 14:38:33,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 371179520. Throughput: 0: 42010.1. Samples: 253334700. Policy #0 lag: (min: 0.0, avg: 23.0, max: 44.0) [2024-03-29 14:38:33,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:38:35,177][00497] Updated weights for policy 0, policy_version 22659 (0.0035) [2024-03-29 14:38:38,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41765.3). Total num frames: 371359744. Throughput: 0: 41756.9. Samples: 253576660. Policy #0 lag: (min: 0.0, avg: 23.0, max: 44.0) [2024-03-29 14:38:38,841][00126] Avg episode reward: [(0, '0.350')] [2024-03-29 14:38:39,389][00497] Updated weights for policy 0, policy_version 22669 (0.0021) [2024-03-29 14:38:42,707][00497] Updated weights for policy 0, policy_version 22679 (0.0021) [2024-03-29 14:38:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 371605504. Throughput: 0: 41535.9. Samples: 253810280. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 14:38:43,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:38:47,508][00497] Updated weights for policy 0, policy_version 22689 (0.0025) [2024-03-29 14:38:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 371785728. Throughput: 0: 41894.7. Samples: 253959300. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 14:38:48,840][00126] Avg episode reward: [(0, '0.329')] [2024-03-29 14:38:50,941][00497] Updated weights for policy 0, policy_version 22699 (0.0020) [2024-03-29 14:38:53,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 41820.8). Total num frames: 371998720. Throughput: 0: 41437.6. Samples: 254194600. Policy #0 lag: (min: 0.0, avg: 22.4, max: 41.0) [2024-03-29 14:38:53,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:38:55,140][00497] Updated weights for policy 0, policy_version 22709 (0.0022) [2024-03-29 14:38:57,088][00476] Signal inference workers to stop experience collection... (9100 times) [2024-03-29 14:38:57,117][00497] InferenceWorker_p0-w0: stopping experience collection (9100 times) [2024-03-29 14:38:57,265][00476] Signal inference workers to resume experience collection... (9100 times) [2024-03-29 14:38:57,266][00497] InferenceWorker_p0-w0: resuming experience collection (9100 times) [2024-03-29 14:38:58,213][00497] Updated weights for policy 0, policy_version 22719 (0.0026) [2024-03-29 14:38:58,839][00126] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 372244480. Throughput: 0: 41572.4. Samples: 254440620. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 14:38:58,840][00126] Avg episode reward: [(0, '0.386')] [2024-03-29 14:39:02,953][00497] Updated weights for policy 0, policy_version 22729 (0.0022) [2024-03-29 14:39:03,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 372424704. Throughput: 0: 41750.2. Samples: 254582940. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 14:39:03,840][00126] Avg episode reward: [(0, '0.402')] [2024-03-29 14:39:03,881][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022732_372441088.pth... [2024-03-29 14:39:04,197][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022118_362381312.pth [2024-03-29 14:39:06,585][00497] Updated weights for policy 0, policy_version 22739 (0.0026) [2024-03-29 14:39:08,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 372637696. Throughput: 0: 41855.2. Samples: 254827400. Policy #0 lag: (min: 1.0, avg: 22.8, max: 42.0) [2024-03-29 14:39:08,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:39:10,657][00497] Updated weights for policy 0, policy_version 22749 (0.0023) [2024-03-29 14:39:13,756][00497] Updated weights for policy 0, policy_version 22759 (0.0022) [2024-03-29 14:39:13,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 372883456. Throughput: 0: 41732.4. Samples: 255074500. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:13,840][00126] Avg episode reward: [(0, '0.302')] [2024-03-29 14:39:18,701][00497] Updated weights for policy 0, policy_version 22769 (0.0031) [2024-03-29 14:39:18,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 373047296. Throughput: 0: 41728.0. Samples: 255212460. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:18,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:39:21,985][00497] Updated weights for policy 0, policy_version 22779 (0.0023) [2024-03-29 14:39:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 373276672. Throughput: 0: 41886.3. Samples: 255461540. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:23,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 14:39:26,182][00497] Updated weights for policy 0, policy_version 22789 (0.0021) [2024-03-29 14:39:28,776][00476] Signal inference workers to stop experience collection... (9150 times) [2024-03-29 14:39:28,783][00476] Signal inference workers to resume experience collection... (9150 times) [2024-03-29 14:39:28,806][00497] InferenceWorker_p0-w0: stopping experience collection (9150 times) [2024-03-29 14:39:28,826][00497] InferenceWorker_p0-w0: resuming experience collection (9150 times) [2024-03-29 14:39:28,839][00126] Fps is (10 sec: 45875.7, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 373506048. Throughput: 0: 42159.2. Samples: 255707440. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:39:28,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 14:39:29,431][00497] Updated weights for policy 0, policy_version 22799 (0.0017) [2024-03-29 14:39:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 373686272. Throughput: 0: 41887.5. Samples: 255844240. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 14:39:33,841][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 14:39:34,352][00497] Updated weights for policy 0, policy_version 22809 (0.0018) [2024-03-29 14:39:37,497][00497] Updated weights for policy 0, policy_version 22819 (0.0026) [2024-03-29 14:39:38,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 373915648. Throughput: 0: 42223.2. Samples: 256094640. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 14:39:38,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 14:39:41,843][00497] Updated weights for policy 0, policy_version 22829 (0.0017) [2024-03-29 14:39:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 374128640. Throughput: 0: 42222.7. Samples: 256340640. Policy #0 lag: (min: 1.0, avg: 22.4, max: 41.0) [2024-03-29 14:39:43,840][00126] Avg episode reward: [(0, '0.291')] [2024-03-29 14:39:45,119][00497] Updated weights for policy 0, policy_version 22839 (0.0029) [2024-03-29 14:39:48,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 374308864. Throughput: 0: 41753.4. Samples: 256461840. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 14:39:48,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:39:49,930][00497] Updated weights for policy 0, policy_version 22849 (0.0023) [2024-03-29 14:39:53,086][00497] Updated weights for policy 0, policy_version 22859 (0.0023) [2024-03-29 14:39:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.5, 300 sec: 42043.0). Total num frames: 374538240. Throughput: 0: 42159.2. Samples: 256724560. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 14:39:53,840][00126] Avg episode reward: [(0, '0.455')] [2024-03-29 14:39:57,467][00497] Updated weights for policy 0, policy_version 22869 (0.0022) [2024-03-29 14:39:58,839][00126] Fps is (10 sec: 45875.8, 60 sec: 42052.4, 300 sec: 42043.0). Total num frames: 374767616. Throughput: 0: 42145.1. Samples: 256971020. Policy #0 lag: (min: 1.0, avg: 22.5, max: 41.0) [2024-03-29 14:39:58,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:40:00,639][00497] Updated weights for policy 0, policy_version 22879 (0.0029) [2024-03-29 14:40:02,626][00476] Signal inference workers to stop experience collection... (9200 times) [2024-03-29 14:40:02,659][00497] InferenceWorker_p0-w0: stopping experience collection (9200 times) [2024-03-29 14:40:02,809][00476] Signal inference workers to resume experience collection... (9200 times) [2024-03-29 14:40:02,810][00497] InferenceWorker_p0-w0: resuming experience collection (9200 times) [2024-03-29 14:40:03,839][00126] Fps is (10 sec: 40959.4, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 374947840. Throughput: 0: 41763.6. Samples: 257091820. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:40:03,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:40:05,455][00497] Updated weights for policy 0, policy_version 22889 (0.0026) [2024-03-29 14:40:08,760][00497] Updated weights for policy 0, policy_version 22899 (0.0026) [2024-03-29 14:40:08,839][00126] Fps is (10 sec: 40959.1, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 375177216. Throughput: 0: 42144.0. Samples: 257358020. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:40:08,840][00126] Avg episode reward: [(0, '0.296')] [2024-03-29 14:40:13,396][00497] Updated weights for policy 0, policy_version 22909 (0.0029) [2024-03-29 14:40:13,839][00126] Fps is (10 sec: 40960.7, 60 sec: 41233.2, 300 sec: 41820.9). Total num frames: 375357440. Throughput: 0: 42104.9. Samples: 257602160. Policy #0 lag: (min: 1.0, avg: 23.4, max: 42.0) [2024-03-29 14:40:13,840][00126] Avg episode reward: [(0, '0.334')] [2024-03-29 14:40:16,548][00497] Updated weights for policy 0, policy_version 22919 (0.0022) [2024-03-29 14:40:18,839][00126] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 375570432. Throughput: 0: 41460.5. Samples: 257709960. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 14:40:18,840][00126] Avg episode reward: [(0, '0.303')] [2024-03-29 14:40:21,327][00497] Updated weights for policy 0, policy_version 22929 (0.0023) [2024-03-29 14:40:23,839][00126] Fps is (10 sec: 42597.5, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 375783424. Throughput: 0: 41913.7. Samples: 257980760. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 14:40:23,840][00126] Avg episode reward: [(0, '0.361')] [2024-03-29 14:40:24,672][00497] Updated weights for policy 0, policy_version 22939 (0.0029) [2024-03-29 14:40:28,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41820.8). Total num frames: 375980032. Throughput: 0: 41937.4. Samples: 258227820. Policy #0 lag: (min: 0.0, avg: 22.4, max: 42.0) [2024-03-29 14:40:28,840][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 14:40:29,054][00497] Updated weights for policy 0, policy_version 22949 (0.0031) [2024-03-29 14:40:32,257][00497] Updated weights for policy 0, policy_version 22959 (0.0022) [2024-03-29 14:40:33,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 376193024. Throughput: 0: 41696.8. Samples: 258338200. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 14:40:33,840][00126] Avg episode reward: [(0, '0.290')] [2024-03-29 14:40:37,065][00497] Updated weights for policy 0, policy_version 22969 (0.0024) [2024-03-29 14:40:38,588][00476] Signal inference workers to stop experience collection... (9250 times) [2024-03-29 14:40:38,617][00497] InferenceWorker_p0-w0: stopping experience collection (9250 times) [2024-03-29 14:40:38,805][00476] Signal inference workers to resume experience collection... (9250 times) [2024-03-29 14:40:38,806][00497] InferenceWorker_p0-w0: resuming experience collection (9250 times) [2024-03-29 14:40:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 41932.0). Total num frames: 376406016. Throughput: 0: 42085.8. Samples: 258618420. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 14:40:38,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:40:40,295][00497] Updated weights for policy 0, policy_version 22979 (0.0030) [2024-03-29 14:40:43,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 376619008. Throughput: 0: 41871.4. Samples: 258855240. Policy #0 lag: (min: 1.0, avg: 22.1, max: 41.0) [2024-03-29 14:40:43,840][00126] Avg episode reward: [(0, '0.265')] [2024-03-29 14:40:44,658][00497] Updated weights for policy 0, policy_version 22989 (0.0031) [2024-03-29 14:40:47,827][00497] Updated weights for policy 0, policy_version 22999 (0.0020) [2024-03-29 14:40:48,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 376832000. Throughput: 0: 41753.9. Samples: 258970740. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 14:40:48,840][00126] Avg episode reward: [(0, '0.468')] [2024-03-29 14:40:52,640][00497] Updated weights for policy 0, policy_version 23009 (0.0032) [2024-03-29 14:40:53,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 377028608. Throughput: 0: 41848.9. Samples: 259241220. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 14:40:53,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:40:55,951][00497] Updated weights for policy 0, policy_version 23019 (0.0021) [2024-03-29 14:40:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 377241600. Throughput: 0: 41792.0. Samples: 259482800. Policy #0 lag: (min: 0.0, avg: 23.1, max: 42.0) [2024-03-29 14:40:58,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:41:00,163][00497] Updated weights for policy 0, policy_version 23029 (0.0020) [2024-03-29 14:41:03,306][00497] Updated weights for policy 0, policy_version 23039 (0.0024) [2024-03-29 14:41:03,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.3, 300 sec: 42043.2). Total num frames: 377487360. Throughput: 0: 42117.7. Samples: 259605260. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:03,840][00126] Avg episode reward: [(0, '0.301')] [2024-03-29 14:41:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023040_377487360.pth... [2024-03-29 14:41:04,175][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022425_367411200.pth [2024-03-29 14:41:08,220][00497] Updated weights for policy 0, policy_version 23049 (0.0032) [2024-03-29 14:41:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 377651200. Throughput: 0: 42138.8. Samples: 259877000. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:08,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:41:09,952][00476] Signal inference workers to stop experience collection... (9300 times) [2024-03-29 14:41:09,959][00476] Signal inference workers to resume experience collection... (9300 times) [2024-03-29 14:41:09,978][00497] InferenceWorker_p0-w0: stopping experience collection (9300 times) [2024-03-29 14:41:10,001][00497] InferenceWorker_p0-w0: resuming experience collection (9300 times) [2024-03-29 14:41:11,455][00497] Updated weights for policy 0, policy_version 23059 (0.0025) [2024-03-29 14:41:13,839][00126] Fps is (10 sec: 37683.5, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 377864192. Throughput: 0: 41819.5. Samples: 260109700. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:13,841][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:41:15,651][00497] Updated weights for policy 0, policy_version 23069 (0.0018) [2024-03-29 14:41:18,839][00126] Fps is (10 sec: 45875.0, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 378109952. Throughput: 0: 42375.6. Samples: 260245100. Policy #0 lag: (min: 1.0, avg: 22.9, max: 41.0) [2024-03-29 14:41:18,840][00126] Avg episode reward: [(0, '0.309')] [2024-03-29 14:41:18,865][00497] Updated weights for policy 0, policy_version 23079 (0.0026) [2024-03-29 14:41:23,774][00497] Updated weights for policy 0, policy_version 23089 (0.0026) [2024-03-29 14:41:23,839][00126] Fps is (10 sec: 42598.0, 60 sec: 41779.2, 300 sec: 41987.4). Total num frames: 378290176. Throughput: 0: 41967.3. Samples: 260506960. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:41:23,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:41:26,696][00497] Updated weights for policy 0, policy_version 23099 (0.0032) [2024-03-29 14:41:28,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 378519552. Throughput: 0: 41947.6. Samples: 260742880. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:41:28,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 14:41:31,056][00497] Updated weights for policy 0, policy_version 23109 (0.0019) [2024-03-29 14:41:33,839][00126] Fps is (10 sec: 45875.9, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 378748928. Throughput: 0: 42553.8. Samples: 260885660. Policy #0 lag: (min: 0.0, avg: 22.5, max: 42.0) [2024-03-29 14:41:33,840][00126] Avg episode reward: [(0, '0.479')] [2024-03-29 14:41:34,373][00497] Updated weights for policy 0, policy_version 23119 (0.0029) [2024-03-29 14:41:38,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 378912768. Throughput: 0: 42128.1. Samples: 261136980. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:41:38,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:41:39,296][00497] Updated weights for policy 0, policy_version 23129 (0.0022) [2024-03-29 14:41:42,135][00476] Signal inference workers to stop experience collection... (9350 times) [2024-03-29 14:41:42,200][00497] InferenceWorker_p0-w0: stopping experience collection (9350 times) [2024-03-29 14:41:42,232][00476] Signal inference workers to resume experience collection... (9350 times) [2024-03-29 14:41:42,234][00497] InferenceWorker_p0-w0: resuming experience collection (9350 times) [2024-03-29 14:41:42,493][00497] Updated weights for policy 0, policy_version 23139 (0.0020) [2024-03-29 14:41:43,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 379174912. Throughput: 0: 42068.4. Samples: 261375880. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:41:43,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:41:46,643][00497] Updated weights for policy 0, policy_version 23149 (0.0026) [2024-03-29 14:41:48,839][00126] Fps is (10 sec: 45874.7, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 379371520. Throughput: 0: 42428.9. Samples: 261514560. Policy #0 lag: (min: 0.0, avg: 20.4, max: 41.0) [2024-03-29 14:41:48,840][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:41:49,914][00497] Updated weights for policy 0, policy_version 23159 (0.0020) [2024-03-29 14:41:53,839][00126] Fps is (10 sec: 39321.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 379568128. Throughput: 0: 42204.5. Samples: 261776200. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 14:41:53,840][00126] Avg episode reward: [(0, '0.447')] [2024-03-29 14:41:54,601][00497] Updated weights for policy 0, policy_version 23169 (0.0018) [2024-03-29 14:41:57,802][00497] Updated weights for policy 0, policy_version 23179 (0.0019) [2024-03-29 14:41:58,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42871.4, 300 sec: 42098.5). Total num frames: 379813888. Throughput: 0: 42479.2. Samples: 262021260. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 14:41:58,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:42:02,133][00497] Updated weights for policy 0, policy_version 23189 (0.0024) [2024-03-29 14:42:03,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 380010496. Throughput: 0: 42480.5. Samples: 262156720. Policy #0 lag: (min: 0.0, avg: 19.7, max: 40.0) [2024-03-29 14:42:03,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:42:05,341][00497] Updated weights for policy 0, policy_version 23199 (0.0035) [2024-03-29 14:42:08,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 380207104. Throughput: 0: 42217.9. Samples: 262406760. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:42:08,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:42:10,271][00497] Updated weights for policy 0, policy_version 23209 (0.0023) [2024-03-29 14:42:13,310][00497] Updated weights for policy 0, policy_version 23219 (0.0017) [2024-03-29 14:42:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42871.5, 300 sec: 42043.0). Total num frames: 380436480. Throughput: 0: 42510.2. Samples: 262655840. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:42:13,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:42:17,784][00497] Updated weights for policy 0, policy_version 23229 (0.0023) [2024-03-29 14:42:18,058][00476] Signal inference workers to stop experience collection... (9400 times) [2024-03-29 14:42:18,095][00497] InferenceWorker_p0-w0: stopping experience collection (9400 times) [2024-03-29 14:42:18,284][00476] Signal inference workers to resume experience collection... (9400 times) [2024-03-29 14:42:18,285][00497] InferenceWorker_p0-w0: resuming experience collection (9400 times) [2024-03-29 14:42:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 380633088. Throughput: 0: 42164.5. Samples: 262783060. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:42:18,840][00126] Avg episode reward: [(0, '0.304')] [2024-03-29 14:42:21,285][00497] Updated weights for policy 0, policy_version 23239 (0.0023) [2024-03-29 14:42:23,839][00126] Fps is (10 sec: 39321.7, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 380829696. Throughput: 0: 41893.3. Samples: 263022180. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:23,840][00126] Avg episode reward: [(0, '0.295')] [2024-03-29 14:42:26,270][00497] Updated weights for policy 0, policy_version 23249 (0.0031) [2024-03-29 14:42:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 381042688. Throughput: 0: 42206.2. Samples: 263275160. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:28,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:42:29,103][00497] Updated weights for policy 0, policy_version 23259 (0.0030) [2024-03-29 14:42:33,671][00497] Updated weights for policy 0, policy_version 23269 (0.0023) [2024-03-29 14:42:33,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 381239296. Throughput: 0: 41760.5. Samples: 263393780. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:33,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:42:36,961][00497] Updated weights for policy 0, policy_version 23279 (0.0026) [2024-03-29 14:42:38,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 381468672. Throughput: 0: 41561.2. Samples: 263646460. Policy #0 lag: (min: 0.0, avg: 20.6, max: 41.0) [2024-03-29 14:42:38,840][00126] Avg episode reward: [(0, '0.403')] [2024-03-29 14:42:41,979][00497] Updated weights for policy 0, policy_version 23289 (0.0021) [2024-03-29 14:42:43,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 381665280. Throughput: 0: 41911.6. Samples: 263907280. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:42:43,840][00126] Avg episode reward: [(0, '0.276')] [2024-03-29 14:42:44,926][00497] Updated weights for policy 0, policy_version 23299 (0.0027) [2024-03-29 14:42:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 381878272. Throughput: 0: 41251.5. Samples: 264013040. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:42:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:42:49,175][00476] Signal inference workers to stop experience collection... (9450 times) [2024-03-29 14:42:49,207][00497] InferenceWorker_p0-w0: stopping experience collection (9450 times) [2024-03-29 14:42:49,379][00476] Signal inference workers to resume experience collection... (9450 times) [2024-03-29 14:42:49,379][00497] InferenceWorker_p0-w0: resuming experience collection (9450 times) [2024-03-29 14:42:49,382][00497] Updated weights for policy 0, policy_version 23309 (0.0019) [2024-03-29 14:42:52,764][00497] Updated weights for policy 0, policy_version 23319 (0.0019) [2024-03-29 14:42:53,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 382091264. Throughput: 0: 41334.7. Samples: 264266820. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:42:53,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:42:57,545][00497] Updated weights for policy 0, policy_version 23329 (0.0023) [2024-03-29 14:42:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 382287872. Throughput: 0: 41913.3. Samples: 264541940. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 14:42:58,840][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:43:00,619][00497] Updated weights for policy 0, policy_version 23339 (0.0027) [2024-03-29 14:43:03,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 382484480. Throughput: 0: 41317.3. Samples: 264642340. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 14:43:03,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:43:04,009][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023346_382500864.pth... [2024-03-29 14:43:04,331][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000022732_372441088.pth [2024-03-29 14:43:05,195][00497] Updated weights for policy 0, policy_version 23349 (0.0020) [2024-03-29 14:43:08,774][00497] Updated weights for policy 0, policy_version 23359 (0.0021) [2024-03-29 14:43:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 382713856. Throughput: 0: 41538.6. Samples: 264891420. Policy #0 lag: (min: 0.0, avg: 19.7, max: 42.0) [2024-03-29 14:43:08,840][00126] Avg episode reward: [(0, '0.500')] [2024-03-29 14:43:13,432][00497] Updated weights for policy 0, policy_version 23369 (0.0021) [2024-03-29 14:43:13,839][00126] Fps is (10 sec: 40959.5, 60 sec: 40959.9, 300 sec: 41820.8). Total num frames: 382894080. Throughput: 0: 41785.2. Samples: 265155500. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:43:13,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:43:16,511][00497] Updated weights for policy 0, policy_version 23379 (0.0025) [2024-03-29 14:43:18,839][00126] Fps is (10 sec: 39321.5, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 383107072. Throughput: 0: 41452.3. Samples: 265259140. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:43:18,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:43:21,346][00497] Updated weights for policy 0, policy_version 23389 (0.0020) [2024-03-29 14:43:21,906][00476] Signal inference workers to stop experience collection... (9500 times) [2024-03-29 14:43:21,911][00476] Signal inference workers to resume experience collection... (9500 times) [2024-03-29 14:43:21,958][00497] InferenceWorker_p0-w0: stopping experience collection (9500 times) [2024-03-29 14:43:21,958][00497] InferenceWorker_p0-w0: resuming experience collection (9500 times) [2024-03-29 14:43:23,839][00126] Fps is (10 sec: 44237.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 383336448. Throughput: 0: 41535.7. Samples: 265515560. Policy #0 lag: (min: 1.0, avg: 19.6, max: 41.0) [2024-03-29 14:43:23,840][00126] Avg episode reward: [(0, '0.407')] [2024-03-29 14:43:24,878][00497] Updated weights for policy 0, policy_version 23399 (0.0021) [2024-03-29 14:43:28,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 383516672. Throughput: 0: 41463.1. Samples: 265773120. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:28,841][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:43:29,084][00497] Updated weights for policy 0, policy_version 23409 (0.0016) [2024-03-29 14:43:31,996][00497] Updated weights for policy 0, policy_version 23419 (0.0038) [2024-03-29 14:43:33,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 383762432. Throughput: 0: 41803.1. Samples: 265894180. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:33,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:43:36,850][00497] Updated weights for policy 0, policy_version 23429 (0.0020) [2024-03-29 14:43:38,839][00126] Fps is (10 sec: 44236.5, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 383959040. Throughput: 0: 42038.7. Samples: 266158560. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:38,840][00126] Avg episode reward: [(0, '0.351')] [2024-03-29 14:43:40,238][00497] Updated weights for policy 0, policy_version 23439 (0.0018) [2024-03-29 14:43:43,839][00126] Fps is (10 sec: 37683.7, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 384139264. Throughput: 0: 41261.0. Samples: 266398680. Policy #0 lag: (min: 0.0, avg: 19.8, max: 41.0) [2024-03-29 14:43:43,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:43:44,552][00497] Updated weights for policy 0, policy_version 23449 (0.0018) [2024-03-29 14:43:47,658][00497] Updated weights for policy 0, policy_version 23459 (0.0025) [2024-03-29 14:43:48,839][00126] Fps is (10 sec: 42598.7, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 384385024. Throughput: 0: 41892.0. Samples: 266527480. Policy #0 lag: (min: 1.0, avg: 18.6, max: 40.0) [2024-03-29 14:43:48,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 14:43:52,422][00497] Updated weights for policy 0, policy_version 23469 (0.0023) [2024-03-29 14:43:53,731][00476] Signal inference workers to stop experience collection... (9550 times) [2024-03-29 14:43:53,806][00476] Signal inference workers to resume experience collection... (9550 times) [2024-03-29 14:43:53,809][00497] InferenceWorker_p0-w0: stopping experience collection (9550 times) [2024-03-29 14:43:53,834][00497] InferenceWorker_p0-w0: resuming experience collection (9550 times) [2024-03-29 14:43:53,839][00126] Fps is (10 sec: 45874.7, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 384598016. Throughput: 0: 42065.8. Samples: 266784380. Policy #0 lag: (min: 1.0, avg: 18.6, max: 40.0) [2024-03-29 14:43:53,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 14:43:55,727][00497] Updated weights for policy 0, policy_version 23479 (0.0030) [2024-03-29 14:43:58,839][00126] Fps is (10 sec: 40959.3, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 384794624. Throughput: 0: 41714.6. Samples: 267032660. Policy #0 lag: (min: 1.0, avg: 18.6, max: 40.0) [2024-03-29 14:43:58,842][00126] Avg episode reward: [(0, '0.454')] [2024-03-29 14:43:59,978][00497] Updated weights for policy 0, policy_version 23489 (0.0021) [2024-03-29 14:44:03,211][00497] Updated weights for policy 0, policy_version 23499 (0.0019) [2024-03-29 14:44:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 385024000. Throughput: 0: 42329.3. Samples: 267163960. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:44:03,840][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:44:07,957][00497] Updated weights for policy 0, policy_version 23509 (0.0023) [2024-03-29 14:44:08,839][00126] Fps is (10 sec: 42599.0, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 385220608. Throughput: 0: 42200.4. Samples: 267414580. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:44:08,840][00126] Avg episode reward: [(0, '0.337')] [2024-03-29 14:44:11,507][00497] Updated weights for policy 0, policy_version 23519 (0.0028) [2024-03-29 14:44:13,839][00126] Fps is (10 sec: 40960.6, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 385433600. Throughput: 0: 42066.7. Samples: 267666120. Policy #0 lag: (min: 0.0, avg: 19.5, max: 42.0) [2024-03-29 14:44:13,840][00126] Avg episode reward: [(0, '0.416')] [2024-03-29 14:44:15,782][00497] Updated weights for policy 0, policy_version 23529 (0.0018) [2024-03-29 14:44:18,781][00497] Updated weights for policy 0, policy_version 23539 (0.0024) [2024-03-29 14:44:18,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 385662976. Throughput: 0: 42397.9. Samples: 267802080. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 14:44:18,840][00126] Avg episode reward: [(0, '0.379')] [2024-03-29 14:44:23,472][00497] Updated weights for policy 0, policy_version 23549 (0.0034) [2024-03-29 14:44:23,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 385843200. Throughput: 0: 41880.0. Samples: 268043160. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 14:44:23,840][00126] Avg episode reward: [(0, '0.272')] [2024-03-29 14:44:27,327][00497] Updated weights for policy 0, policy_version 23559 (0.0020) [2024-03-29 14:44:28,839][00126] Fps is (10 sec: 37683.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 386039808. Throughput: 0: 41825.8. Samples: 268280840. Policy #0 lag: (min: 2.0, avg: 20.2, max: 42.0) [2024-03-29 14:44:28,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:44:29,095][00476] Signal inference workers to stop experience collection... (9600 times) [2024-03-29 14:44:29,147][00497] InferenceWorker_p0-w0: stopping experience collection (9600 times) [2024-03-29 14:44:29,182][00476] Signal inference workers to resume experience collection... (9600 times) [2024-03-29 14:44:29,183][00497] InferenceWorker_p0-w0: resuming experience collection (9600 times) [2024-03-29 14:44:31,446][00497] Updated weights for policy 0, policy_version 23569 (0.0022) [2024-03-29 14:44:33,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 386269184. Throughput: 0: 42176.8. Samples: 268425440. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 14:44:33,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:44:34,515][00497] Updated weights for policy 0, policy_version 23579 (0.0025) [2024-03-29 14:44:38,839][00126] Fps is (10 sec: 42598.2, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 386465792. Throughput: 0: 41873.0. Samples: 268668660. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 14:44:38,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:44:39,082][00497] Updated weights for policy 0, policy_version 23589 (0.0027) [2024-03-29 14:44:42,665][00497] Updated weights for policy 0, policy_version 23599 (0.0019) [2024-03-29 14:44:43,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 386695168. Throughput: 0: 41981.8. Samples: 268921840. Policy #0 lag: (min: 1.0, avg: 19.7, max: 41.0) [2024-03-29 14:44:43,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:44:46,706][00497] Updated weights for policy 0, policy_version 23609 (0.0023) [2024-03-29 14:44:48,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 386908160. Throughput: 0: 42097.8. Samples: 269058360. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:44:48,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:44:49,964][00497] Updated weights for policy 0, policy_version 23619 (0.0023) [2024-03-29 14:44:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41820.8). Total num frames: 387104768. Throughput: 0: 41981.4. Samples: 269303740. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:44:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:44:54,564][00497] Updated weights for policy 0, policy_version 23629 (0.0023) [2024-03-29 14:44:58,085][00497] Updated weights for policy 0, policy_version 23639 (0.0026) [2024-03-29 14:44:58,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 387317760. Throughput: 0: 41952.4. Samples: 269553980. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:44:58,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:45:02,417][00497] Updated weights for policy 0, policy_version 23649 (0.0032) [2024-03-29 14:45:03,840][00126] Fps is (10 sec: 42596.3, 60 sec: 41778.9, 300 sec: 41876.3). Total num frames: 387530752. Throughput: 0: 41886.6. Samples: 269687000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 41.0) [2024-03-29 14:45:03,840][00126] Avg episode reward: [(0, '0.392')] [2024-03-29 14:45:04,367][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023655_387563520.pth... [2024-03-29 14:45:04,736][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023040_377487360.pth [2024-03-29 14:45:05,032][00476] Signal inference workers to stop experience collection... (9650 times) [2024-03-29 14:45:05,118][00497] InferenceWorker_p0-w0: stopping experience collection (9650 times) [2024-03-29 14:45:05,265][00476] Signal inference workers to resume experience collection... (9650 times) [2024-03-29 14:45:05,266][00497] InferenceWorker_p0-w0: resuming experience collection (9650 times) [2024-03-29 14:45:05,855][00497] Updated weights for policy 0, policy_version 23659 (0.0030) [2024-03-29 14:45:08,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 387727360. Throughput: 0: 41775.0. Samples: 269923040. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 14:45:08,840][00126] Avg episode reward: [(0, '0.426')] [2024-03-29 14:45:10,191][00497] Updated weights for policy 0, policy_version 23669 (0.0019) [2024-03-29 14:45:13,802][00497] Updated weights for policy 0, policy_version 23679 (0.0023) [2024-03-29 14:45:13,839][00126] Fps is (10 sec: 42600.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 387956736. Throughput: 0: 41934.7. Samples: 270167900. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 14:45:13,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:45:18,270][00497] Updated weights for policy 0, policy_version 23689 (0.0027) [2024-03-29 14:45:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41233.1, 300 sec: 41876.4). Total num frames: 388136960. Throughput: 0: 41816.1. Samples: 270307160. Policy #0 lag: (min: 1.0, avg: 21.2, max: 43.0) [2024-03-29 14:45:18,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:45:21,234][00497] Updated weights for policy 0, policy_version 23699 (0.0021) [2024-03-29 14:45:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 388366336. Throughput: 0: 41804.4. Samples: 270549860. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 14:45:23,840][00126] Avg episode reward: [(0, '0.354')] [2024-03-29 14:45:25,806][00497] Updated weights for policy 0, policy_version 23709 (0.0019) [2024-03-29 14:45:28,839][00126] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 388595712. Throughput: 0: 41950.3. Samples: 270809600. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 14:45:28,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 14:45:29,327][00497] Updated weights for policy 0, policy_version 23719 (0.0019) [2024-03-29 14:45:33,733][00497] Updated weights for policy 0, policy_version 23729 (0.0026) [2024-03-29 14:45:33,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 388775936. Throughput: 0: 41700.6. Samples: 270934880. Policy #0 lag: (min: 3.0, avg: 23.2, max: 44.0) [2024-03-29 14:45:33,840][00126] Avg episode reward: [(0, '0.277')] [2024-03-29 14:45:36,809][00497] Updated weights for policy 0, policy_version 23739 (0.0023) [2024-03-29 14:45:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 389005312. Throughput: 0: 41537.7. Samples: 271172940. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 14:45:38,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 14:45:41,459][00497] Updated weights for policy 0, policy_version 23749 (0.0017) [2024-03-29 14:45:41,811][00476] Signal inference workers to stop experience collection... (9700 times) [2024-03-29 14:45:41,850][00497] InferenceWorker_p0-w0: stopping experience collection (9700 times) [2024-03-29 14:45:42,011][00476] Signal inference workers to resume experience collection... (9700 times) [2024-03-29 14:45:42,012][00497] InferenceWorker_p0-w0: resuming experience collection (9700 times) [2024-03-29 14:45:43,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42052.4, 300 sec: 41987.5). Total num frames: 389218304. Throughput: 0: 41992.9. Samples: 271443660. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 14:45:43,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:45:44,858][00497] Updated weights for policy 0, policy_version 23759 (0.0031) [2024-03-29 14:45:48,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 389414912. Throughput: 0: 41827.5. Samples: 271569220. Policy #0 lag: (min: 0.0, avg: 23.0, max: 45.0) [2024-03-29 14:45:48,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 14:45:49,019][00497] Updated weights for policy 0, policy_version 23769 (0.0022) [2024-03-29 14:45:52,275][00497] Updated weights for policy 0, policy_version 23779 (0.0030) [2024-03-29 14:45:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 389644288. Throughput: 0: 41963.6. Samples: 271811400. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:45:53,840][00126] Avg episode reward: [(0, '0.458')] [2024-03-29 14:45:56,807][00497] Updated weights for policy 0, policy_version 23789 (0.0023) [2024-03-29 14:45:58,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 389840896. Throughput: 0: 42446.3. Samples: 272077980. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:45:58,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:46:00,334][00497] Updated weights for policy 0, policy_version 23799 (0.0017) [2024-03-29 14:46:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.6, 300 sec: 42043.0). Total num frames: 390053888. Throughput: 0: 42147.5. Samples: 272203800. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:46:03,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:46:04,550][00497] Updated weights for policy 0, policy_version 23809 (0.0025) [2024-03-29 14:46:07,780][00497] Updated weights for policy 0, policy_version 23819 (0.0029) [2024-03-29 14:46:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.5, 300 sec: 42098.6). Total num frames: 390283264. Throughput: 0: 42133.3. Samples: 272445860. Policy #0 lag: (min: 0.0, avg: 21.9, max: 41.0) [2024-03-29 14:46:08,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:46:12,416][00497] Updated weights for policy 0, policy_version 23829 (0.0024) [2024-03-29 14:46:13,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 390479872. Throughput: 0: 42164.8. Samples: 272707020. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:46:13,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:46:15,971][00497] Updated weights for policy 0, policy_version 23839 (0.0018) [2024-03-29 14:46:18,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42598.3, 300 sec: 42043.0). Total num frames: 390692864. Throughput: 0: 42147.8. Samples: 272831540. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:46:18,840][00126] Avg episode reward: [(0, '0.287')] [2024-03-29 14:46:20,217][00497] Updated weights for policy 0, policy_version 23849 (0.0023) [2024-03-29 14:46:21,454][00476] Signal inference workers to stop experience collection... (9750 times) [2024-03-29 14:46:21,569][00497] InferenceWorker_p0-w0: stopping experience collection (9750 times) [2024-03-29 14:46:21,646][00476] Signal inference workers to resume experience collection... (9750 times) [2024-03-29 14:46:21,646][00497] InferenceWorker_p0-w0: resuming experience collection (9750 times) [2024-03-29 14:46:23,362][00497] Updated weights for policy 0, policy_version 23859 (0.0023) [2024-03-29 14:46:23,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 390922240. Throughput: 0: 42606.7. Samples: 273090240. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:46:23,840][00126] Avg episode reward: [(0, '0.430')] [2024-03-29 14:46:27,870][00497] Updated weights for policy 0, policy_version 23869 (0.0026) [2024-03-29 14:46:28,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 391118848. Throughput: 0: 42336.8. Samples: 273348820. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:46:28,840][00126] Avg episode reward: [(0, '0.268')] [2024-03-29 14:46:31,147][00497] Updated weights for policy 0, policy_version 23879 (0.0028) [2024-03-29 14:46:33,839][00126] Fps is (10 sec: 40960.3, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 391331840. Throughput: 0: 42246.8. Samples: 273470320. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:46:33,840][00126] Avg episode reward: [(0, '0.326')] [2024-03-29 14:46:35,627][00497] Updated weights for policy 0, policy_version 23889 (0.0024) [2024-03-29 14:46:38,838][00497] Updated weights for policy 0, policy_version 23899 (0.0029) [2024-03-29 14:46:38,839][00126] Fps is (10 sec: 44237.5, 60 sec: 42598.5, 300 sec: 41987.5). Total num frames: 391561216. Throughput: 0: 42723.7. Samples: 273733960. Policy #0 lag: (min: 0.0, avg: 21.0, max: 42.0) [2024-03-29 14:46:38,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:46:43,399][00497] Updated weights for policy 0, policy_version 23909 (0.0022) [2024-03-29 14:46:43,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 391741440. Throughput: 0: 42172.0. Samples: 273975720. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:43,840][00126] Avg episode reward: [(0, '0.324')] [2024-03-29 14:46:46,849][00497] Updated weights for policy 0, policy_version 23919 (0.0025) [2024-03-29 14:46:48,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 391954432. Throughput: 0: 41922.2. Samples: 274090300. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:48,841][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:46:51,341][00497] Updated weights for policy 0, policy_version 23929 (0.0029) [2024-03-29 14:46:53,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 392167424. Throughput: 0: 42522.6. Samples: 274359380. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:53,840][00126] Avg episode reward: [(0, '0.315')] [2024-03-29 14:46:54,570][00497] Updated weights for policy 0, policy_version 23939 (0.0024) [2024-03-29 14:46:57,195][00476] Signal inference workers to stop experience collection... (9800 times) [2024-03-29 14:46:57,196][00476] Signal inference workers to resume experience collection... (9800 times) [2024-03-29 14:46:57,233][00497] InferenceWorker_p0-w0: stopping experience collection (9800 times) [2024-03-29 14:46:57,233][00497] InferenceWorker_p0-w0: resuming experience collection (9800 times) [2024-03-29 14:46:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 392364032. Throughput: 0: 42117.9. Samples: 274602320. Policy #0 lag: (min: 0.0, avg: 20.5, max: 42.0) [2024-03-29 14:46:58,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:46:59,034][00497] Updated weights for policy 0, policy_version 23949 (0.0019) [2024-03-29 14:47:02,468][00497] Updated weights for policy 0, policy_version 23959 (0.0023) [2024-03-29 14:47:03,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 392577024. Throughput: 0: 42171.2. Samples: 274729240. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:47:03,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:47:03,952][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023962_392593408.pth... [2024-03-29 14:47:04,291][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023346_382500864.pth [2024-03-29 14:47:06,897][00497] Updated weights for policy 0, policy_version 23969 (0.0017) [2024-03-29 14:47:08,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41506.2, 300 sec: 41820.9). Total num frames: 392773632. Throughput: 0: 42101.9. Samples: 274984820. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:47:08,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:47:10,120][00497] Updated weights for policy 0, policy_version 23979 (0.0025) [2024-03-29 14:47:13,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 392986624. Throughput: 0: 41717.4. Samples: 275226100. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:47:13,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 14:47:14,721][00497] Updated weights for policy 0, policy_version 23989 (0.0024) [2024-03-29 14:47:18,152][00497] Updated weights for policy 0, policy_version 23999 (0.0026) [2024-03-29 14:47:18,839][00126] Fps is (10 sec: 44236.1, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 393216000. Throughput: 0: 41865.7. Samples: 275354280. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:47:18,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:47:22,626][00497] Updated weights for policy 0, policy_version 24009 (0.0030) [2024-03-29 14:47:23,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41233.2, 300 sec: 41876.4). Total num frames: 393396224. Throughput: 0: 41845.3. Samples: 275617000. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:47:23,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:47:25,768][00497] Updated weights for policy 0, policy_version 24019 (0.0023) [2024-03-29 14:47:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 393625600. Throughput: 0: 41853.2. Samples: 275859120. Policy #0 lag: (min: 1.0, avg: 20.5, max: 43.0) [2024-03-29 14:47:28,841][00126] Avg episode reward: [(0, '0.357')] [2024-03-29 14:47:30,272][00497] Updated weights for policy 0, policy_version 24029 (0.0026) [2024-03-29 14:47:30,781][00476] Signal inference workers to stop experience collection... (9850 times) [2024-03-29 14:47:30,858][00476] Signal inference workers to resume experience collection... (9850 times) [2024-03-29 14:47:30,859][00497] InferenceWorker_p0-w0: stopping experience collection (9850 times) [2024-03-29 14:47:30,883][00497] InferenceWorker_p0-w0: resuming experience collection (9850 times) [2024-03-29 14:47:33,666][00497] Updated weights for policy 0, policy_version 24039 (0.0026) [2024-03-29 14:47:33,839][00126] Fps is (10 sec: 45874.2, 60 sec: 42052.1, 300 sec: 41987.5). Total num frames: 393854976. Throughput: 0: 42190.5. Samples: 275988880. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 14:47:33,840][00126] Avg episode reward: [(0, '0.483')] [2024-03-29 14:47:38,133][00497] Updated weights for policy 0, policy_version 24049 (0.0019) [2024-03-29 14:47:38,839][00126] Fps is (10 sec: 40960.6, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 394035200. Throughput: 0: 42047.3. Samples: 276251500. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 14:47:38,840][00126] Avg episode reward: [(0, '0.347')] [2024-03-29 14:47:41,369][00497] Updated weights for policy 0, policy_version 24059 (0.0032) [2024-03-29 14:47:43,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 394248192. Throughput: 0: 41575.0. Samples: 276473200. Policy #0 lag: (min: 0.0, avg: 19.9, max: 40.0) [2024-03-29 14:47:43,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:47:45,927][00497] Updated weights for policy 0, policy_version 24069 (0.0029) [2024-03-29 14:47:48,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 394477568. Throughput: 0: 41796.4. Samples: 276610080. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:47:48,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:47:49,369][00497] Updated weights for policy 0, policy_version 24079 (0.0019) [2024-03-29 14:47:53,839][00126] Fps is (10 sec: 40960.5, 60 sec: 41506.2, 300 sec: 41931.9). Total num frames: 394657792. Throughput: 0: 41754.6. Samples: 276863780. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:47:53,841][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 14:47:54,048][00497] Updated weights for policy 0, policy_version 24089 (0.0023) [2024-03-29 14:47:57,166][00497] Updated weights for policy 0, policy_version 24099 (0.0019) [2024-03-29 14:47:58,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 394887168. Throughput: 0: 41495.6. Samples: 277093400. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:47:58,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 14:48:01,650][00497] Updated weights for policy 0, policy_version 24109 (0.0028) [2024-03-29 14:48:03,840][00126] Fps is (10 sec: 42597.4, 60 sec: 41779.0, 300 sec: 41931.9). Total num frames: 395083776. Throughput: 0: 41893.2. Samples: 277239480. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:48:03,840][00126] Avg episode reward: [(0, '0.437')] [2024-03-29 14:48:04,570][00476] Signal inference workers to stop experience collection... (9900 times) [2024-03-29 14:48:04,607][00497] InferenceWorker_p0-w0: stopping experience collection (9900 times) [2024-03-29 14:48:04,800][00476] Signal inference workers to resume experience collection... (9900 times) [2024-03-29 14:48:04,801][00497] InferenceWorker_p0-w0: resuming experience collection (9900 times) [2024-03-29 14:48:05,116][00497] Updated weights for policy 0, policy_version 24119 (0.0024) [2024-03-29 14:48:08,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41779.1, 300 sec: 41987.5). Total num frames: 395280384. Throughput: 0: 41490.1. Samples: 277484060. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:48:08,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 14:48:09,708][00497] Updated weights for policy 0, policy_version 24129 (0.0023) [2024-03-29 14:48:12,932][00497] Updated weights for policy 0, policy_version 24139 (0.0020) [2024-03-29 14:48:13,839][00126] Fps is (10 sec: 44237.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 395526144. Throughput: 0: 41469.9. Samples: 277725260. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:48:13,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:48:17,326][00497] Updated weights for policy 0, policy_version 24149 (0.0028) [2024-03-29 14:48:18,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 395722752. Throughput: 0: 41673.8. Samples: 277864200. Policy #0 lag: (min: 1.0, avg: 20.9, max: 43.0) [2024-03-29 14:48:18,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:48:20,700][00497] Updated weights for policy 0, policy_version 24159 (0.0019) [2024-03-29 14:48:23,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42052.2, 300 sec: 42043.0). Total num frames: 395919360. Throughput: 0: 41452.3. Samples: 278116860. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:48:23,842][00126] Avg episode reward: [(0, '0.314')] [2024-03-29 14:48:25,387][00497] Updated weights for policy 0, policy_version 24169 (0.0019) [2024-03-29 14:48:28,429][00497] Updated weights for policy 0, policy_version 24179 (0.0019) [2024-03-29 14:48:28,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 396165120. Throughput: 0: 42024.1. Samples: 278364280. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:48:28,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:48:32,814][00497] Updated weights for policy 0, policy_version 24189 (0.0018) [2024-03-29 14:48:33,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 396345344. Throughput: 0: 41954.2. Samples: 278498020. Policy #0 lag: (min: 1.0, avg: 22.3, max: 43.0) [2024-03-29 14:48:33,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:48:36,440][00497] Updated weights for policy 0, policy_version 24199 (0.0023) [2024-03-29 14:48:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 396558336. Throughput: 0: 41558.1. Samples: 278733900. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:38,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:48:41,043][00497] Updated weights for policy 0, policy_version 24209 (0.0018) [2024-03-29 14:48:42,109][00476] Signal inference workers to stop experience collection... (9950 times) [2024-03-29 14:48:42,109][00476] Signal inference workers to resume experience collection... (9950 times) [2024-03-29 14:48:42,145][00497] InferenceWorker_p0-w0: stopping experience collection (9950 times) [2024-03-29 14:48:42,145][00497] InferenceWorker_p0-w0: resuming experience collection (9950 times) [2024-03-29 14:48:43,839][00126] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 396771328. Throughput: 0: 42424.8. Samples: 279002520. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:48:44,179][00497] Updated weights for policy 0, policy_version 24219 (0.0024) [2024-03-29 14:48:48,415][00497] Updated weights for policy 0, policy_version 24229 (0.0022) [2024-03-29 14:48:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 396984320. Throughput: 0: 41671.3. Samples: 279114680. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:48,840][00126] Avg episode reward: [(0, '0.254')] [2024-03-29 14:48:52,153][00497] Updated weights for policy 0, policy_version 24239 (0.0027) [2024-03-29 14:48:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 42052.1, 300 sec: 41987.5). Total num frames: 397180928. Throughput: 0: 41750.1. Samples: 279362820. Policy #0 lag: (min: 0.0, avg: 21.6, max: 42.0) [2024-03-29 14:48:53,840][00126] Avg episode reward: [(0, '0.358')] [2024-03-29 14:48:56,656][00497] Updated weights for policy 0, policy_version 24249 (0.0024) [2024-03-29 14:48:58,839][00126] Fps is (10 sec: 40960.0, 60 sec: 41779.2, 300 sec: 41932.0). Total num frames: 397393920. Throughput: 0: 42388.9. Samples: 279632760. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 14:48:58,841][00126] Avg episode reward: [(0, '0.397')] [2024-03-29 14:49:00,184][00497] Updated weights for policy 0, policy_version 24260 (0.0023) [2024-03-29 14:49:03,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 397606912. Throughput: 0: 41592.4. Samples: 279735860. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 14:49:03,840][00126] Avg episode reward: [(0, '0.323')] [2024-03-29 14:49:04,097][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024269_397623296.pth... [2024-03-29 14:49:04,428][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023655_387563520.pth [2024-03-29 14:49:04,763][00497] Updated weights for policy 0, policy_version 24270 (0.0020) [2024-03-29 14:49:08,024][00497] Updated weights for policy 0, policy_version 24280 (0.0028) [2024-03-29 14:49:08,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 397819904. Throughput: 0: 41659.6. Samples: 279991540. Policy #0 lag: (min: 0.0, avg: 20.9, max: 41.0) [2024-03-29 14:49:08,840][00126] Avg episode reward: [(0, '0.300')] [2024-03-29 14:49:12,834][00497] Updated weights for policy 0, policy_version 24290 (0.0022) [2024-03-29 14:49:13,604][00476] Signal inference workers to stop experience collection... (10000 times) [2024-03-29 14:49:13,685][00476] Signal inference workers to resume experience collection... (10000 times) [2024-03-29 14:49:13,687][00497] InferenceWorker_p0-w0: stopping experience collection (10000 times) [2024-03-29 14:49:13,718][00497] InferenceWorker_p0-w0: resuming experience collection (10000 times) [2024-03-29 14:49:13,839][00126] Fps is (10 sec: 39322.1, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 398000128. Throughput: 0: 41880.9. Samples: 280248920. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 14:49:13,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 14:49:15,970][00497] Updated weights for policy 0, policy_version 24300 (0.0030) [2024-03-29 14:49:18,839][00126] Fps is (10 sec: 40960.9, 60 sec: 41779.4, 300 sec: 41987.5). Total num frames: 398229504. Throughput: 0: 41291.8. Samples: 280356140. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 14:49:18,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:49:20,256][00497] Updated weights for policy 0, policy_version 24310 (0.0024) [2024-03-29 14:49:23,753][00497] Updated weights for policy 0, policy_version 24320 (0.0021) [2024-03-29 14:49:23,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42325.4, 300 sec: 42098.5). Total num frames: 398458880. Throughput: 0: 42013.8. Samples: 280624520. Policy #0 lag: (min: 0.0, avg: 18.4, max: 42.0) [2024-03-29 14:49:23,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:49:28,416][00497] Updated weights for policy 0, policy_version 24330 (0.0019) [2024-03-29 14:49:28,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 398639104. Throughput: 0: 42006.3. Samples: 280892800. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:28,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:49:31,380][00497] Updated weights for policy 0, policy_version 24340 (0.0020) [2024-03-29 14:49:33,839][00126] Fps is (10 sec: 39321.8, 60 sec: 41779.3, 300 sec: 41987.5). Total num frames: 398852096. Throughput: 0: 41840.4. Samples: 280997500. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:33,841][00126] Avg episode reward: [(0, '0.355')] [2024-03-29 14:49:35,658][00497] Updated weights for policy 0, policy_version 24350 (0.0026) [2024-03-29 14:49:38,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 399081472. Throughput: 0: 42123.2. Samples: 281258360. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:38,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 14:49:39,299][00497] Updated weights for policy 0, policy_version 24360 (0.0026) [2024-03-29 14:49:43,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41233.1, 300 sec: 41820.9). Total num frames: 399245312. Throughput: 0: 41753.8. Samples: 281511680. Policy #0 lag: (min: 1.0, avg: 19.4, max: 43.0) [2024-03-29 14:49:43,840][00126] Avg episode reward: [(0, '0.320')] [2024-03-29 14:49:44,196][00497] Updated weights for policy 0, policy_version 24370 (0.0028) [2024-03-29 14:49:46,441][00476] Signal inference workers to stop experience collection... (10050 times) [2024-03-29 14:49:46,442][00476] Signal inference workers to resume experience collection... (10050 times) [2024-03-29 14:49:46,477][00497] InferenceWorker_p0-w0: stopping experience collection (10050 times) [2024-03-29 14:49:46,477][00497] InferenceWorker_p0-w0: resuming experience collection (10050 times) [2024-03-29 14:49:47,363][00497] Updated weights for policy 0, policy_version 24380 (0.0023) [2024-03-29 14:49:48,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 399491072. Throughput: 0: 42004.5. Samples: 281626060. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 14:49:48,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 14:49:51,327][00497] Updated weights for policy 0, policy_version 24390 (0.0023) [2024-03-29 14:49:53,839][00126] Fps is (10 sec: 44236.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 399687680. Throughput: 0: 42116.8. Samples: 281886800. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 14:49:53,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:49:55,102][00497] Updated weights for policy 0, policy_version 24400 (0.0029) [2024-03-29 14:49:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 41876.5). Total num frames: 399884288. Throughput: 0: 41996.4. Samples: 282138760. Policy #0 lag: (min: 1.0, avg: 19.2, max: 41.0) [2024-03-29 14:49:58,840][00126] Avg episode reward: [(0, '0.463')] [2024-03-29 14:49:59,593][00497] Updated weights for policy 0, policy_version 24410 (0.0019) [2024-03-29 14:50:02,986][00497] Updated weights for policy 0, policy_version 24420 (0.0032) [2024-03-29 14:50:03,839][00126] Fps is (10 sec: 44237.3, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 400130048. Throughput: 0: 42375.4. Samples: 282263040. Policy #0 lag: (min: 1.0, avg: 19.9, max: 42.0) [2024-03-29 14:50:03,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:50:06,932][00497] Updated weights for policy 0, policy_version 24430 (0.0023) [2024-03-29 14:50:08,839][00126] Fps is (10 sec: 44236.2, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 400326656. Throughput: 0: 41898.6. Samples: 282509960. Policy #0 lag: (min: 1.0, avg: 19.9, max: 42.0) [2024-03-29 14:50:08,840][00126] Avg episode reward: [(0, '0.311')] [2024-03-29 14:50:10,636][00497] Updated weights for policy 0, policy_version 24440 (0.0023) [2024-03-29 14:50:13,839][00126] Fps is (10 sec: 37683.4, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 400506880. Throughput: 0: 41540.4. Samples: 282762120. Policy #0 lag: (min: 1.0, avg: 19.9, max: 42.0) [2024-03-29 14:50:13,841][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:50:15,069][00497] Updated weights for policy 0, policy_version 24450 (0.0029) [2024-03-29 14:50:17,231][00476] Signal inference workers to stop experience collection... (10100 times) [2024-03-29 14:50:17,264][00497] InferenceWorker_p0-w0: stopping experience collection (10100 times) [2024-03-29 14:50:17,418][00476] Signal inference workers to resume experience collection... (10100 times) [2024-03-29 14:50:17,418][00497] InferenceWorker_p0-w0: resuming experience collection (10100 times) [2024-03-29 14:50:18,619][00497] Updated weights for policy 0, policy_version 24460 (0.0031) [2024-03-29 14:50:18,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 400752640. Throughput: 0: 42187.2. Samples: 282895920. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:18,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:50:22,845][00497] Updated weights for policy 0, policy_version 24470 (0.0029) [2024-03-29 14:50:23,839][00126] Fps is (10 sec: 44236.8, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 400949248. Throughput: 0: 41663.6. Samples: 283133220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:23,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 14:50:26,447][00497] Updated weights for policy 0, policy_version 24480 (0.0033) [2024-03-29 14:50:28,839][00126] Fps is (10 sec: 40959.7, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 401162240. Throughput: 0: 41744.4. Samples: 283390180. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:28,840][00126] Avg episode reward: [(0, '0.332')] [2024-03-29 14:50:30,638][00497] Updated weights for policy 0, policy_version 24490 (0.0022) [2024-03-29 14:50:33,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 401391616. Throughput: 0: 42270.1. Samples: 283528220. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:33,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:50:34,078][00497] Updated weights for policy 0, policy_version 24500 (0.0028) [2024-03-29 14:50:38,230][00497] Updated weights for policy 0, policy_version 24510 (0.0019) [2024-03-29 14:50:38,839][00126] Fps is (10 sec: 40960.2, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 401571840. Throughput: 0: 41961.1. Samples: 283775040. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:38,840][00126] Avg episode reward: [(0, '0.408')] [2024-03-29 14:50:41,947][00497] Updated weights for policy 0, policy_version 24520 (0.0025) [2024-03-29 14:50:43,839][00126] Fps is (10 sec: 40960.0, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 401801216. Throughput: 0: 41835.0. Samples: 284021340. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:43,840][00126] Avg episode reward: [(0, '0.425')] [2024-03-29 14:50:46,219][00497] Updated weights for policy 0, policy_version 24530 (0.0022) [2024-03-29 14:50:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 401997824. Throughput: 0: 42117.8. Samples: 284158340. Policy #0 lag: (min: 1.0, avg: 21.7, max: 42.0) [2024-03-29 14:50:48,840][00126] Avg episode reward: [(0, '0.424')] [2024-03-29 14:50:49,720][00497] Updated weights for policy 0, policy_version 24540 (0.0035) [2024-03-29 14:50:51,909][00476] Signal inference workers to stop experience collection... (10150 times) [2024-03-29 14:50:51,978][00497] InferenceWorker_p0-w0: stopping experience collection (10150 times) [2024-03-29 14:50:51,997][00476] Signal inference workers to resume experience collection... (10150 times) [2024-03-29 14:50:51,998][00497] InferenceWorker_p0-w0: resuming experience collection (10150 times) [2024-03-29 14:50:53,718][00497] Updated weights for policy 0, policy_version 24550 (0.0031) [2024-03-29 14:50:53,842][00126] Fps is (10 sec: 42587.3, 60 sec: 42323.5, 300 sec: 41987.1). Total num frames: 402227200. Throughput: 0: 42214.9. Samples: 284409740. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:50:53,843][00126] Avg episode reward: [(0, '0.433')] [2024-03-29 14:50:57,588][00497] Updated weights for policy 0, policy_version 24560 (0.0024) [2024-03-29 14:50:58,839][00126] Fps is (10 sec: 44236.3, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 402440192. Throughput: 0: 41955.5. Samples: 284650120. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:50:58,840][00126] Avg episode reward: [(0, '0.390')] [2024-03-29 14:51:01,716][00497] Updated weights for policy 0, policy_version 24570 (0.0027) [2024-03-29 14:51:03,839][00126] Fps is (10 sec: 40970.7, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 402636800. Throughput: 0: 42148.3. Samples: 284792600. Policy #0 lag: (min: 0.0, avg: 21.2, max: 43.0) [2024-03-29 14:51:03,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:51:04,026][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024576_402653184.pth... [2024-03-29 14:51:04,333][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000023962_392593408.pth [2024-03-29 14:51:05,376][00497] Updated weights for policy 0, policy_version 24580 (0.0028) [2024-03-29 14:51:08,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 402849792. Throughput: 0: 42296.3. Samples: 285036560. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:08,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 14:51:09,521][00497] Updated weights for policy 0, policy_version 24590 (0.0023) [2024-03-29 14:51:13,176][00497] Updated weights for policy 0, policy_version 24600 (0.0023) [2024-03-29 14:51:13,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42871.4, 300 sec: 41987.5). Total num frames: 403079168. Throughput: 0: 42061.3. Samples: 285282940. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:13,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:51:17,237][00497] Updated weights for policy 0, policy_version 24610 (0.0019) [2024-03-29 14:51:18,839][00126] Fps is (10 sec: 40960.4, 60 sec: 41779.1, 300 sec: 41820.9). Total num frames: 403259392. Throughput: 0: 42199.2. Samples: 285427180. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:18,840][00126] Avg episode reward: [(0, '0.382')] [2024-03-29 14:51:20,953][00476] Signal inference workers to stop experience collection... (10200 times) [2024-03-29 14:51:20,995][00497] InferenceWorker_p0-w0: stopping experience collection (10200 times) [2024-03-29 14:51:21,033][00476] Signal inference workers to resume experience collection... (10200 times) [2024-03-29 14:51:21,035][00497] InferenceWorker_p0-w0: resuming experience collection (10200 times) [2024-03-29 14:51:21,039][00497] Updated weights for policy 0, policy_version 24620 (0.0024) [2024-03-29 14:51:23,839][00126] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 403472384. Throughput: 0: 41815.5. Samples: 285656740. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:51:23,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:51:25,283][00497] Updated weights for policy 0, policy_version 24630 (0.0019) [2024-03-29 14:51:28,740][00497] Updated weights for policy 0, policy_version 24640 (0.0023) [2024-03-29 14:51:28,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.4, 300 sec: 41931.9). Total num frames: 403701760. Throughput: 0: 42011.7. Samples: 285911860. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:51:28,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:51:32,998][00497] Updated weights for policy 0, policy_version 24650 (0.0029) [2024-03-29 14:51:33,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 403881984. Throughput: 0: 41938.1. Samples: 286045560. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:51:33,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:51:36,462][00497] Updated weights for policy 0, policy_version 24660 (0.0033) [2024-03-29 14:51:38,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 404094976. Throughput: 0: 41662.9. Samples: 286284460. Policy #0 lag: (min: 1.0, avg: 19.5, max: 41.0) [2024-03-29 14:51:38,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:51:41,004][00497] Updated weights for policy 0, policy_version 24670 (0.0025) [2024-03-29 14:51:43,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 404324352. Throughput: 0: 42216.1. Samples: 286549840. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:51:43,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:51:44,242][00497] Updated weights for policy 0, policy_version 24680 (0.0024) [2024-03-29 14:51:48,762][00497] Updated weights for policy 0, policy_version 24690 (0.0019) [2024-03-29 14:51:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 404520960. Throughput: 0: 41864.6. Samples: 286676500. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:51:48,840][00126] Avg episode reward: [(0, '0.394')] [2024-03-29 14:51:52,138][00497] Updated weights for policy 0, policy_version 24700 (0.0025) [2024-03-29 14:51:53,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41781.1, 300 sec: 41931.9). Total num frames: 404733952. Throughput: 0: 41859.7. Samples: 286920240. Policy #0 lag: (min: 1.0, avg: 20.1, max: 41.0) [2024-03-29 14:51:53,840][00126] Avg episode reward: [(0, '0.417')] [2024-03-29 14:51:54,280][00476] Signal inference workers to stop experience collection... (10250 times) [2024-03-29 14:51:54,355][00476] Signal inference workers to resume experience collection... (10250 times) [2024-03-29 14:51:54,357][00497] InferenceWorker_p0-w0: stopping experience collection (10250 times) [2024-03-29 14:51:54,389][00497] InferenceWorker_p0-w0: resuming experience collection (10250 times) [2024-03-29 14:51:56,500][00497] Updated weights for policy 0, policy_version 24710 (0.0028) [2024-03-29 14:51:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 404946944. Throughput: 0: 42279.6. Samples: 287185520. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:51:58,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:51:59,881][00497] Updated weights for policy 0, policy_version 24720 (0.0020) [2024-03-29 14:52:03,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 405159936. Throughput: 0: 41608.0. Samples: 287299540. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:52:03,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 14:52:04,253][00497] Updated weights for policy 0, policy_version 24730 (0.0019) [2024-03-29 14:52:07,653][00497] Updated weights for policy 0, policy_version 24740 (0.0028) [2024-03-29 14:52:08,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 405389312. Throughput: 0: 42307.9. Samples: 287560600. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:52:08,840][00126] Avg episode reward: [(0, '0.443')] [2024-03-29 14:52:12,043][00497] Updated weights for policy 0, policy_version 24750 (0.0019) [2024-03-29 14:52:13,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 405569536. Throughput: 0: 42223.0. Samples: 287811900. Policy #0 lag: (min: 1.0, avg: 20.7, max: 42.0) [2024-03-29 14:52:13,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:52:15,579][00497] Updated weights for policy 0, policy_version 24760 (0.0019) [2024-03-29 14:52:18,839][00126] Fps is (10 sec: 39322.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 405782528. Throughput: 0: 41813.9. Samples: 287927180. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:52:18,841][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:52:19,993][00497] Updated weights for policy 0, policy_version 24770 (0.0020) [2024-03-29 14:52:23,426][00497] Updated weights for policy 0, policy_version 24780 (0.0021) [2024-03-29 14:52:23,719][00476] Signal inference workers to stop experience collection... (10300 times) [2024-03-29 14:52:23,719][00476] Signal inference workers to resume experience collection... (10300 times) [2024-03-29 14:52:23,759][00497] InferenceWorker_p0-w0: stopping experience collection (10300 times) [2024-03-29 14:52:23,759][00497] InferenceWorker_p0-w0: resuming experience collection (10300 times) [2024-03-29 14:52:23,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 406011904. Throughput: 0: 42271.6. Samples: 288186680. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:52:23,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:52:27,494][00497] Updated weights for policy 0, policy_version 24790 (0.0022) [2024-03-29 14:52:28,839][00126] Fps is (10 sec: 42597.8, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 406208512. Throughput: 0: 41991.0. Samples: 288439440. Policy #0 lag: (min: 1.0, avg: 22.2, max: 43.0) [2024-03-29 14:52:28,840][00126] Avg episode reward: [(0, '0.431')] [2024-03-29 14:52:31,047][00497] Updated weights for policy 0, policy_version 24800 (0.0030) [2024-03-29 14:52:33,839][00126] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 406405120. Throughput: 0: 41839.9. Samples: 288559300. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:33,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:52:35,584][00497] Updated weights for policy 0, policy_version 24810 (0.0018) [2024-03-29 14:52:38,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42325.4, 300 sec: 41987.5). Total num frames: 406634496. Throughput: 0: 42305.8. Samples: 288824000. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:38,840][00126] Avg episode reward: [(0, '0.411')] [2024-03-29 14:52:38,892][00497] Updated weights for policy 0, policy_version 24820 (0.0021) [2024-03-29 14:52:43,030][00497] Updated weights for policy 0, policy_version 24830 (0.0024) [2024-03-29 14:52:43,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41779.1, 300 sec: 41876.4). Total num frames: 406831104. Throughput: 0: 41988.8. Samples: 289075020. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:43,840][00126] Avg episode reward: [(0, '0.413')] [2024-03-29 14:52:46,326][00497] Updated weights for policy 0, policy_version 24840 (0.0024) [2024-03-29 14:52:48,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 407044096. Throughput: 0: 42368.5. Samples: 289206120. Policy #0 lag: (min: 0.0, avg: 21.8, max: 41.0) [2024-03-29 14:52:48,840][00126] Avg episode reward: [(0, '0.328')] [2024-03-29 14:52:50,788][00497] Updated weights for policy 0, policy_version 24850 (0.0026) [2024-03-29 14:52:53,839][00126] Fps is (10 sec: 44237.0, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 407273472. Throughput: 0: 42379.5. Samples: 289467680. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:52:53,842][00126] Avg episode reward: [(0, '0.376')] [2024-03-29 14:52:54,272][00497] Updated weights for policy 0, policy_version 24860 (0.0022) [2024-03-29 14:52:58,398][00497] Updated weights for policy 0, policy_version 24870 (0.0023) [2024-03-29 14:52:58,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 407486464. Throughput: 0: 42484.1. Samples: 289723680. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:52:58,840][00126] Avg episode reward: [(0, '0.325')] [2024-03-29 14:53:00,369][00476] Signal inference workers to stop experience collection... (10350 times) [2024-03-29 14:53:00,394][00497] InferenceWorker_p0-w0: stopping experience collection (10350 times) [2024-03-29 14:53:00,550][00476] Signal inference workers to resume experience collection... (10350 times) [2024-03-29 14:53:00,550][00497] InferenceWorker_p0-w0: resuming experience collection (10350 times) [2024-03-29 14:53:01,869][00497] Updated weights for policy 0, policy_version 24880 (0.0018) [2024-03-29 14:53:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 407699456. Throughput: 0: 42566.6. Samples: 289842680. Policy #0 lag: (min: 1.0, avg: 20.5, max: 42.0) [2024-03-29 14:53:03,840][00126] Avg episode reward: [(0, '0.385')] [2024-03-29 14:53:04,045][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024885_407715840.pth... [2024-03-29 14:53:04,388][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024269_397623296.pth [2024-03-29 14:53:06,133][00497] Updated weights for policy 0, policy_version 24890 (0.0023) [2024-03-29 14:53:08,839][00126] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 407912448. Throughput: 0: 42859.9. Samples: 290115380. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:53:08,840][00126] Avg episode reward: [(0, '0.363')] [2024-03-29 14:53:09,694][00497] Updated weights for policy 0, policy_version 24900 (0.0025) [2024-03-29 14:53:13,712][00497] Updated weights for policy 0, policy_version 24910 (0.0030) [2024-03-29 14:53:13,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42043.0). Total num frames: 408125440. Throughput: 0: 42746.2. Samples: 290363020. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:53:13,840][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:53:17,356][00497] Updated weights for policy 0, policy_version 24920 (0.0019) [2024-03-29 14:53:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 408338432. Throughput: 0: 42770.8. Samples: 290483980. Policy #0 lag: (min: 1.0, avg: 20.2, max: 42.0) [2024-03-29 14:53:18,840][00126] Avg episode reward: [(0, '0.346')] [2024-03-29 14:53:21,699][00497] Updated weights for policy 0, policy_version 24930 (0.0028) [2024-03-29 14:53:23,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.2, 300 sec: 41987.5). Total num frames: 408551424. Throughput: 0: 42815.4. Samples: 290750700. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:23,840][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 14:53:25,181][00497] Updated weights for policy 0, policy_version 24940 (0.0022) [2024-03-29 14:53:28,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.4, 300 sec: 42043.0). Total num frames: 408748032. Throughput: 0: 42532.2. Samples: 290988960. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:28,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:53:29,200][00497] Updated weights for policy 0, policy_version 24950 (0.0026) [2024-03-29 14:53:32,931][00497] Updated weights for policy 0, policy_version 24960 (0.0028) [2024-03-29 14:53:33,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42871.5, 300 sec: 42098.6). Total num frames: 408977408. Throughput: 0: 42680.0. Samples: 291126720. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:33,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:53:35,174][00476] Signal inference workers to stop experience collection... (10400 times) [2024-03-29 14:53:35,175][00476] Signal inference workers to resume experience collection... (10400 times) [2024-03-29 14:53:35,215][00497] InferenceWorker_p0-w0: stopping experience collection (10400 times) [2024-03-29 14:53:35,216][00497] InferenceWorker_p0-w0: resuming experience collection (10400 times) [2024-03-29 14:53:37,372][00497] Updated weights for policy 0, policy_version 24970 (0.0019) [2024-03-29 14:53:38,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42598.3, 300 sec: 42098.5). Total num frames: 409190400. Throughput: 0: 42496.9. Samples: 291380040. Policy #0 lag: (min: 1.0, avg: 20.1, max: 42.0) [2024-03-29 14:53:38,840][00126] Avg episode reward: [(0, '0.448')] [2024-03-29 14:53:40,699][00497] Updated weights for policy 0, policy_version 24980 (0.0030) [2024-03-29 14:53:43,839][00126] Fps is (10 sec: 40960.2, 60 sec: 42598.5, 300 sec: 42043.0). Total num frames: 409387008. Throughput: 0: 42132.4. Samples: 291619640. Policy #0 lag: (min: 0.0, avg: 20.9, max: 40.0) [2024-03-29 14:53:43,840][00126] Avg episode reward: [(0, '0.473')] [2024-03-29 14:53:44,819][00497] Updated weights for policy 0, policy_version 24990 (0.0029) [2024-03-29 14:53:48,306][00497] Updated weights for policy 0, policy_version 25000 (0.0023) [2024-03-29 14:53:48,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42871.5, 300 sec: 42154.1). Total num frames: 409616384. Throughput: 0: 42570.3. Samples: 291758340. Policy #0 lag: (min: 0.0, avg: 20.9, max: 40.0) [2024-03-29 14:53:48,840][00126] Avg episode reward: [(0, '0.498')] [2024-03-29 14:53:52,634][00497] Updated weights for policy 0, policy_version 25010 (0.0024) [2024-03-29 14:53:53,839][00126] Fps is (10 sec: 42597.6, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 409812992. Throughput: 0: 42153.7. Samples: 292012300. Policy #0 lag: (min: 0.0, avg: 20.9, max: 40.0) [2024-03-29 14:53:53,840][00126] Avg episode reward: [(0, '0.327')] [2024-03-29 14:53:56,083][00497] Updated weights for policy 0, policy_version 25020 (0.0024) [2024-03-29 14:53:58,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 410025984. Throughput: 0: 42140.1. Samples: 292259320. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:53:58,840][00126] Avg episode reward: [(0, '0.356')] [2024-03-29 14:54:00,170][00497] Updated weights for policy 0, policy_version 25030 (0.0020) [2024-03-29 14:54:03,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42325.3, 300 sec: 42098.6). Total num frames: 410238976. Throughput: 0: 42384.9. Samples: 292391300. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:54:03,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 14:54:03,914][00497] Updated weights for policy 0, policy_version 25040 (0.0027) [2024-03-29 14:54:06,492][00476] Signal inference workers to stop experience collection... (10450 times) [2024-03-29 14:54:06,566][00476] Signal inference workers to resume experience collection... (10450 times) [2024-03-29 14:54:06,569][00497] InferenceWorker_p0-w0: stopping experience collection (10450 times) [2024-03-29 14:54:06,593][00497] InferenceWorker_p0-w0: resuming experience collection (10450 times) [2024-03-29 14:54:08,181][00497] Updated weights for policy 0, policy_version 25050 (0.0018) [2024-03-29 14:54:08,839][00126] Fps is (10 sec: 40959.3, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 410435584. Throughput: 0: 42157.3. Samples: 292647780. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:54:08,840][00126] Avg episode reward: [(0, '0.461')] [2024-03-29 14:54:11,631][00497] Updated weights for policy 0, policy_version 25060 (0.0027) [2024-03-29 14:54:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 410664960. Throughput: 0: 42179.1. Samples: 292887020. Policy #0 lag: (min: 1.0, avg: 22.6, max: 42.0) [2024-03-29 14:54:13,840][00126] Avg episode reward: [(0, '0.404')] [2024-03-29 14:54:15,676][00497] Updated weights for policy 0, policy_version 25070 (0.0022) [2024-03-29 14:54:18,839][00126] Fps is (10 sec: 42599.1, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 410861568. Throughput: 0: 42083.6. Samples: 293020480. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:54:18,840][00126] Avg episode reward: [(0, '0.342')] [2024-03-29 14:54:19,467][00497] Updated weights for policy 0, policy_version 25080 (0.0025) [2024-03-29 14:54:23,839][00126] Fps is (10 sec: 39321.0, 60 sec: 41779.2, 300 sec: 42098.5). Total num frames: 411058176. Throughput: 0: 42239.1. Samples: 293280800. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:54:23,840][00126] Avg episode reward: [(0, '0.439')] [2024-03-29 14:54:23,891][00497] Updated weights for policy 0, policy_version 25090 (0.0022) [2024-03-29 14:54:26,862][00497] Updated weights for policy 0, policy_version 25100 (0.0024) [2024-03-29 14:54:28,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 411303936. Throughput: 0: 42226.1. Samples: 293519820. Policy #0 lag: (min: 0.0, avg: 21.1, max: 41.0) [2024-03-29 14:54:28,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 14:54:31,069][00497] Updated weights for policy 0, policy_version 25110 (0.0019) [2024-03-29 14:54:33,839][00126] Fps is (10 sec: 45875.6, 60 sec: 42325.3, 300 sec: 42154.1). Total num frames: 411516928. Throughput: 0: 42312.8. Samples: 293662420. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:54:33,840][00126] Avg episode reward: [(0, '0.398')] [2024-03-29 14:54:34,859][00497] Updated weights for policy 0, policy_version 25120 (0.0019) [2024-03-29 14:54:38,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 411697152. Throughput: 0: 42437.4. Samples: 293921980. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:54:38,840][00126] Avg episode reward: [(0, '0.436')] [2024-03-29 14:54:39,188][00497] Updated weights for policy 0, policy_version 25130 (0.0018) [2024-03-29 14:54:39,818][00476] Signal inference workers to stop experience collection... (10500 times) [2024-03-29 14:54:39,855][00497] InferenceWorker_p0-w0: stopping experience collection (10500 times) [2024-03-29 14:54:39,983][00476] Signal inference workers to resume experience collection... (10500 times) [2024-03-29 14:54:39,984][00497] InferenceWorker_p0-w0: resuming experience collection (10500 times) [2024-03-29 14:54:42,421][00497] Updated weights for policy 0, policy_version 25140 (0.0025) [2024-03-29 14:54:43,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42871.4, 300 sec: 42265.2). Total num frames: 411959296. Throughput: 0: 42228.9. Samples: 294159620. Policy #0 lag: (min: 1.0, avg: 20.6, max: 41.0) [2024-03-29 14:54:43,840][00126] Avg episode reward: [(0, '0.420')] [2024-03-29 14:54:46,451][00497] Updated weights for policy 0, policy_version 25150 (0.0020) [2024-03-29 14:54:48,839][00126] Fps is (10 sec: 44237.1, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 412139520. Throughput: 0: 42439.1. Samples: 294301060. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:54:48,840][00126] Avg episode reward: [(0, '0.313')] [2024-03-29 14:54:50,147][00497] Updated weights for policy 0, policy_version 25160 (0.0019) [2024-03-29 14:54:53,839][00126] Fps is (10 sec: 39321.0, 60 sec: 42325.3, 300 sec: 42265.1). Total num frames: 412352512. Throughput: 0: 42441.8. Samples: 294557660. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:54:53,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:54:54,448][00497] Updated weights for policy 0, policy_version 25170 (0.0023) [2024-03-29 14:54:58,086][00497] Updated weights for policy 0, policy_version 25181 (0.0024) [2024-03-29 14:54:58,839][00126] Fps is (10 sec: 45875.4, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 412598272. Throughput: 0: 42584.9. Samples: 294803340. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:54:58,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:55:02,138][00497] Updated weights for policy 0, policy_version 25191 (0.0019) [2024-03-29 14:55:03,839][00126] Fps is (10 sec: 42598.7, 60 sec: 42325.3, 300 sec: 42209.6). Total num frames: 412778496. Throughput: 0: 42636.8. Samples: 294939140. Policy #0 lag: (min: 0.0, avg: 20.4, max: 42.0) [2024-03-29 14:55:03,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 14:55:04,266][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025196_412811264.pth... [2024-03-29 14:55:04,604][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024576_402653184.pth [2024-03-29 14:55:06,074][00497] Updated weights for policy 0, policy_version 25201 (0.0024) [2024-03-29 14:55:08,839][00126] Fps is (10 sec: 39321.5, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 412991488. Throughput: 0: 42393.9. Samples: 295188520. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:55:08,840][00126] Avg episode reward: [(0, '0.472')] [2024-03-29 14:55:10,304][00497] Updated weights for policy 0, policy_version 25211 (0.0020) [2024-03-29 14:55:13,004][00476] Signal inference workers to stop experience collection... (10550 times) [2024-03-29 14:55:13,079][00476] Signal inference workers to resume experience collection... (10550 times) [2024-03-29 14:55:13,081][00497] InferenceWorker_p0-w0: stopping experience collection (10550 times) [2024-03-29 14:55:13,104][00497] InferenceWorker_p0-w0: resuming experience collection (10550 times) [2024-03-29 14:55:13,640][00497] Updated weights for policy 0, policy_version 25221 (0.0024) [2024-03-29 14:55:13,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 413220864. Throughput: 0: 42704.5. Samples: 295441520. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:55:13,840][00126] Avg episode reward: [(0, '0.339')] [2024-03-29 14:55:17,666][00497] Updated weights for policy 0, policy_version 25231 (0.0017) [2024-03-29 14:55:18,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 413417472. Throughput: 0: 42426.7. Samples: 295571620. Policy #0 lag: (min: 1.0, avg: 20.4, max: 41.0) [2024-03-29 14:55:18,840][00126] Avg episode reward: [(0, '0.307')] [2024-03-29 14:55:21,536][00497] Updated weights for policy 0, policy_version 25241 (0.0018) [2024-03-29 14:55:23,839][00126] Fps is (10 sec: 39321.1, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 413614080. Throughput: 0: 42091.5. Samples: 295816100. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:23,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:55:25,789][00497] Updated weights for policy 0, policy_version 25251 (0.0018) [2024-03-29 14:55:28,839][00126] Fps is (10 sec: 44236.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 413859840. Throughput: 0: 42636.8. Samples: 296078280. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:28,840][00126] Avg episode reward: [(0, '0.474')] [2024-03-29 14:55:29,018][00497] Updated weights for policy 0, policy_version 25261 (0.0017) [2024-03-29 14:55:33,205][00497] Updated weights for policy 0, policy_version 25271 (0.0026) [2024-03-29 14:55:33,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 414056448. Throughput: 0: 42324.0. Samples: 296205640. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:33,840][00126] Avg episode reward: [(0, '0.471')] [2024-03-29 14:55:36,762][00497] Updated weights for policy 0, policy_version 25281 (0.0024) [2024-03-29 14:55:38,839][00126] Fps is (10 sec: 40960.4, 60 sec: 42871.5, 300 sec: 42265.2). Total num frames: 414269440. Throughput: 0: 42238.4. Samples: 296458380. Policy #0 lag: (min: 1.0, avg: 22.0, max: 42.0) [2024-03-29 14:55:38,841][00126] Avg episode reward: [(0, '0.369')] [2024-03-29 14:55:41,201][00497] Updated weights for policy 0, policy_version 25291 (0.0026) [2024-03-29 14:55:43,839][00126] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 414482432. Throughput: 0: 42401.6. Samples: 296711420. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:55:43,840][00126] Avg episode reward: [(0, '0.423')] [2024-03-29 14:55:44,544][00476] Signal inference workers to stop experience collection... (10600 times) [2024-03-29 14:55:44,584][00497] InferenceWorker_p0-w0: stopping experience collection (10600 times) [2024-03-29 14:55:44,624][00476] Signal inference workers to resume experience collection... (10600 times) [2024-03-29 14:55:44,626][00497] InferenceWorker_p0-w0: resuming experience collection (10600 times) [2024-03-29 14:55:44,630][00497] Updated weights for policy 0, policy_version 25301 (0.0026) [2024-03-29 14:55:48,716][00497] Updated weights for policy 0, policy_version 25311 (0.0029) [2024-03-29 14:55:48,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.6). Total num frames: 414695424. Throughput: 0: 42049.4. Samples: 296831360. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:55:48,840][00126] Avg episode reward: [(0, '0.405')] [2024-03-29 14:55:52,338][00497] Updated weights for policy 0, policy_version 25321 (0.0032) [2024-03-29 14:55:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42265.2). Total num frames: 414908416. Throughput: 0: 42385.6. Samples: 297095880. Policy #0 lag: (min: 0.0, avg: 19.9, max: 42.0) [2024-03-29 14:55:53,840][00126] Avg episode reward: [(0, '0.531')] [2024-03-29 14:55:56,526][00497] Updated weights for policy 0, policy_version 25331 (0.0023) [2024-03-29 14:55:58,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 415121408. Throughput: 0: 42459.1. Samples: 297352180. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 14:55:58,840][00126] Avg episode reward: [(0, '0.460')] [2024-03-29 14:55:59,880][00497] Updated weights for policy 0, policy_version 25341 (0.0019) [2024-03-29 14:56:03,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 415334400. Throughput: 0: 42328.9. Samples: 297476420. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 14:56:03,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 14:56:03,942][00497] Updated weights for policy 0, policy_version 25351 (0.0022) [2024-03-29 14:56:07,778][00497] Updated weights for policy 0, policy_version 25361 (0.0021) [2024-03-29 14:56:08,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42598.3, 300 sec: 42265.2). Total num frames: 415547392. Throughput: 0: 42753.4. Samples: 297740000. Policy #0 lag: (min: 1.0, avg: 19.7, max: 42.0) [2024-03-29 14:56:08,840][00126] Avg episode reward: [(0, '0.370')] [2024-03-29 14:56:11,933][00497] Updated weights for policy 0, policy_version 25371 (0.0023) [2024-03-29 14:56:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 415760384. Throughput: 0: 42386.8. Samples: 297985680. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:13,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:56:15,321][00497] Updated weights for policy 0, policy_version 25381 (0.0027) [2024-03-29 14:56:18,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42325.4, 300 sec: 42320.7). Total num frames: 415956992. Throughput: 0: 42332.5. Samples: 298110600. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:18,840][00126] Avg episode reward: [(0, '0.367')] [2024-03-29 14:56:19,642][00497] Updated weights for policy 0, policy_version 25391 (0.0026) [2024-03-29 14:56:19,926][00476] Signal inference workers to stop experience collection... (10650 times) [2024-03-29 14:56:19,967][00497] InferenceWorker_p0-w0: stopping experience collection (10650 times) [2024-03-29 14:56:20,004][00476] Signal inference workers to resume experience collection... (10650 times) [2024-03-29 14:56:20,007][00497] InferenceWorker_p0-w0: resuming experience collection (10650 times) [2024-03-29 14:56:23,249][00497] Updated weights for policy 0, policy_version 25401 (0.0024) [2024-03-29 14:56:23,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42871.6, 300 sec: 42320.7). Total num frames: 416186368. Throughput: 0: 42612.0. Samples: 298375920. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:23,840][00126] Avg episode reward: [(0, '0.478')] [2024-03-29 14:56:27,353][00497] Updated weights for policy 0, policy_version 25411 (0.0023) [2024-03-29 14:56:28,839][00126] Fps is (10 sec: 44236.0, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 416399360. Throughput: 0: 42382.2. Samples: 298618620. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 14:56:28,840][00126] Avg episode reward: [(0, '0.400')] [2024-03-29 14:56:30,752][00497] Updated weights for policy 0, policy_version 25421 (0.0026) [2024-03-29 14:56:33,839][00126] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 42376.3). Total num frames: 416595968. Throughput: 0: 42428.5. Samples: 298740640. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 14:56:33,840][00126] Avg episode reward: [(0, '0.441')] [2024-03-29 14:56:34,934][00497] Updated weights for policy 0, policy_version 25431 (0.0027) [2024-03-29 14:56:38,714][00497] Updated weights for policy 0, policy_version 25441 (0.0025) [2024-03-29 14:56:38,839][00126] Fps is (10 sec: 42599.3, 60 sec: 42598.5, 300 sec: 42376.3). Total num frames: 416825344. Throughput: 0: 42502.9. Samples: 299008500. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 14:56:38,840][00126] Avg episode reward: [(0, '0.419')] [2024-03-29 14:56:42,955][00497] Updated weights for policy 0, policy_version 25451 (0.0024) [2024-03-29 14:56:43,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 417021952. Throughput: 0: 42152.0. Samples: 299249020. Policy #0 lag: (min: 2.0, avg: 21.7, max: 42.0) [2024-03-29 14:56:43,840][00126] Avg episode reward: [(0, '0.377')] [2024-03-29 14:56:46,571][00497] Updated weights for policy 0, policy_version 25461 (0.0024) [2024-03-29 14:56:48,839][00126] Fps is (10 sec: 39321.3, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 417218560. Throughput: 0: 42197.8. Samples: 299375320. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:56:48,840][00126] Avg episode reward: [(0, '0.428')] [2024-03-29 14:56:50,529][00497] Updated weights for policy 0, policy_version 25471 (0.0018) [2024-03-29 14:56:53,412][00476] Signal inference workers to stop experience collection... (10700 times) [2024-03-29 14:56:53,414][00476] Signal inference workers to resume experience collection... (10700 times) [2024-03-29 14:56:53,457][00497] InferenceWorker_p0-w0: stopping experience collection (10700 times) [2024-03-29 14:56:53,457][00497] InferenceWorker_p0-w0: resuming experience collection (10700 times) [2024-03-29 14:56:53,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42376.2). Total num frames: 417447936. Throughput: 0: 42176.5. Samples: 299637940. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:56:53,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:56:54,302][00497] Updated weights for policy 0, policy_version 25481 (0.0027) [2024-03-29 14:56:58,347][00497] Updated weights for policy 0, policy_version 25491 (0.0022) [2024-03-29 14:56:58,839][00126] Fps is (10 sec: 44236.5, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 417660928. Throughput: 0: 42381.7. Samples: 299892860. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:56:58,840][00126] Avg episode reward: [(0, '0.362')] [2024-03-29 14:57:01,847][00497] Updated weights for policy 0, policy_version 25501 (0.0020) [2024-03-29 14:57:03,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42325.3, 300 sec: 42320.7). Total num frames: 417873920. Throughput: 0: 42405.3. Samples: 300018840. Policy #0 lag: (min: 0.0, avg: 22.2, max: 42.0) [2024-03-29 14:57:03,840][00126] Avg episode reward: [(0, '0.372')] [2024-03-29 14:57:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025506_417890304.pth... [2024-03-29 14:57:04,262][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000024885_407715840.pth [2024-03-29 14:57:05,824][00497] Updated weights for policy 0, policy_version 25511 (0.0023) [2024-03-29 14:57:08,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 42431.8). Total num frames: 418086912. Throughput: 0: 42327.1. Samples: 300280640. Policy #0 lag: (min: 2.0, avg: 21.0, max: 43.0) [2024-03-29 14:57:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 14:57:09,665][00497] Updated weights for policy 0, policy_version 25521 (0.0020) [2024-03-29 14:57:13,756][00497] Updated weights for policy 0, policy_version 25531 (0.0026) [2024-03-29 14:57:13,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 418299904. Throughput: 0: 42700.0. Samples: 300540120. Policy #0 lag: (min: 2.0, avg: 21.0, max: 43.0) [2024-03-29 14:57:13,840][00126] Avg episode reward: [(0, '0.389')] [2024-03-29 14:57:17,402][00497] Updated weights for policy 0, policy_version 25541 (0.0025) [2024-03-29 14:57:18,839][00126] Fps is (10 sec: 42597.9, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 418512896. Throughput: 0: 42363.9. Samples: 300647020. Policy #0 lag: (min: 2.0, avg: 21.0, max: 43.0) [2024-03-29 14:57:18,840][00126] Avg episode reward: [(0, '0.467')] [2024-03-29 14:57:21,530][00497] Updated weights for policy 0, policy_version 25551 (0.0019) [2024-03-29 14:57:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 42376.2). Total num frames: 418709504. Throughput: 0: 42054.9. Samples: 300900980. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:23,841][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 14:57:24,951][00476] Signal inference workers to stop experience collection... (10750 times) [2024-03-29 14:57:24,989][00497] InferenceWorker_p0-w0: stopping experience collection (10750 times) [2024-03-29 14:57:25,178][00476] Signal inference workers to resume experience collection... (10750 times) [2024-03-29 14:57:25,178][00497] InferenceWorker_p0-w0: resuming experience collection (10750 times) [2024-03-29 14:57:25,454][00497] Updated weights for policy 0, policy_version 25561 (0.0021) [2024-03-29 14:57:28,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 418889728. Throughput: 0: 42471.1. Samples: 301160220. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:28,840][00126] Avg episode reward: [(0, '0.446')] [2024-03-29 14:57:29,628][00497] Updated weights for policy 0, policy_version 25571 (0.0029) [2024-03-29 14:57:33,110][00497] Updated weights for policy 0, policy_version 25581 (0.0020) [2024-03-29 14:57:33,839][00126] Fps is (10 sec: 42598.9, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 419135488. Throughput: 0: 42157.8. Samples: 301272420. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:33,840][00126] Avg episode reward: [(0, '0.432')] [2024-03-29 14:57:37,214][00497] Updated weights for policy 0, policy_version 25591 (0.0022) [2024-03-29 14:57:38,839][00126] Fps is (10 sec: 44236.7, 60 sec: 41779.1, 300 sec: 42376.2). Total num frames: 419332096. Throughput: 0: 42007.5. Samples: 301528280. Policy #0 lag: (min: 0.0, avg: 20.5, max: 41.0) [2024-03-29 14:57:38,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:57:41,156][00497] Updated weights for policy 0, policy_version 25601 (0.0025) [2024-03-29 14:57:43,839][00126] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42320.7). Total num frames: 419528704. Throughput: 0: 41946.6. Samples: 301780460. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:57:43,840][00126] Avg episode reward: [(0, '0.353')] [2024-03-29 14:57:45,168][00497] Updated weights for policy 0, policy_version 25611 (0.0032) [2024-03-29 14:57:48,795][00497] Updated weights for policy 0, policy_version 25621 (0.0024) [2024-03-29 14:57:48,839][00126] Fps is (10 sec: 44236.9, 60 sec: 42598.3, 300 sec: 42376.2). Total num frames: 419774464. Throughput: 0: 41828.0. Samples: 301901100. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:57:48,840][00126] Avg episode reward: [(0, '0.415')] [2024-03-29 14:57:52,850][00497] Updated weights for policy 0, policy_version 25631 (0.0022) [2024-03-29 14:57:53,839][00126] Fps is (10 sec: 44236.8, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 419971072. Throughput: 0: 41741.2. Samples: 302159000. Policy #0 lag: (min: 0.0, avg: 19.6, max: 40.0) [2024-03-29 14:57:53,840][00126] Avg episode reward: [(0, '0.475')] [2024-03-29 14:57:56,912][00497] Updated weights for policy 0, policy_version 25641 (0.0020) [2024-03-29 14:57:58,839][00126] Fps is (10 sec: 39321.9, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 420167680. Throughput: 0: 41346.3. Samples: 302400700. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:57:58,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:58:00,463][00476] Signal inference workers to stop experience collection... (10800 times) [2024-03-29 14:58:00,505][00497] InferenceWorker_p0-w0: stopping experience collection (10800 times) [2024-03-29 14:58:00,628][00476] Signal inference workers to resume experience collection... (10800 times) [2024-03-29 14:58:00,629][00497] InferenceWorker_p0-w0: resuming experience collection (10800 times) [2024-03-29 14:58:00,885][00497] Updated weights for policy 0, policy_version 25651 (0.0025) [2024-03-29 14:58:03,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 420380672. Throughput: 0: 41566.3. Samples: 302517500. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:58:03,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:58:04,640][00497] Updated weights for policy 0, policy_version 25661 (0.0023) [2024-03-29 14:58:08,599][00497] Updated weights for policy 0, policy_version 25671 (0.0020) [2024-03-29 14:58:08,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 420593664. Throughput: 0: 41840.1. Samples: 302783780. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:58:08,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 14:58:12,650][00497] Updated weights for policy 0, policy_version 25681 (0.0024) [2024-03-29 14:58:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41779.3, 300 sec: 42265.2). Total num frames: 420806656. Throughput: 0: 41504.5. Samples: 303027920. Policy #0 lag: (min: 2.0, avg: 21.5, max: 42.0) [2024-03-29 14:58:13,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:58:16,563][00497] Updated weights for policy 0, policy_version 25691 (0.0032) [2024-03-29 14:58:18,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 420986880. Throughput: 0: 41845.7. Samples: 303155480. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 14:58:18,840][00126] Avg episode reward: [(0, '0.451')] [2024-03-29 14:58:20,524][00497] Updated weights for policy 0, policy_version 25701 (0.0029) [2024-03-29 14:58:23,839][00126] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 421199872. Throughput: 0: 41296.5. Samples: 303386620. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 14:58:23,841][00126] Avg episode reward: [(0, '0.429')] [2024-03-29 14:58:24,610][00497] Updated weights for policy 0, policy_version 25711 (0.0017) [2024-03-29 14:58:28,656][00497] Updated weights for policy 0, policy_version 25721 (0.0028) [2024-03-29 14:58:28,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42052.4, 300 sec: 42154.1). Total num frames: 421412864. Throughput: 0: 41716.2. Samples: 303657680. Policy #0 lag: (min: 0.0, avg: 18.3, max: 42.0) [2024-03-29 14:58:28,840][00126] Avg episode reward: [(0, '0.391')] [2024-03-29 14:58:32,160][00476] Signal inference workers to stop experience collection... (10850 times) [2024-03-29 14:58:32,203][00497] InferenceWorker_p0-w0: stopping experience collection (10850 times) [2024-03-29 14:58:32,368][00476] Signal inference workers to resume experience collection... (10850 times) [2024-03-29 14:58:32,368][00497] InferenceWorker_p0-w0: resuming experience collection (10850 times) [2024-03-29 14:58:32,641][00497] Updated weights for policy 0, policy_version 25731 (0.0024) [2024-03-29 14:58:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 42154.1). Total num frames: 421625856. Throughput: 0: 41875.2. Samples: 303785480. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:33,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 14:58:36,295][00497] Updated weights for policy 0, policy_version 25741 (0.0021) [2024-03-29 14:58:38,839][00126] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 421838848. Throughput: 0: 41117.4. Samples: 304009280. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:38,840][00126] Avg episode reward: [(0, '0.395')] [2024-03-29 14:58:40,601][00497] Updated weights for policy 0, policy_version 25751 (0.0032) [2024-03-29 14:58:43,839][00126] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42043.0). Total num frames: 422019072. Throughput: 0: 41596.0. Samples: 304272520. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:43,840][00126] Avg episode reward: [(0, '0.371')] [2024-03-29 14:58:44,399][00497] Updated weights for policy 0, policy_version 25761 (0.0019) [2024-03-29 14:58:48,173][00497] Updated weights for policy 0, policy_version 25771 (0.0032) [2024-03-29 14:58:48,839][00126] Fps is (10 sec: 40960.3, 60 sec: 41233.1, 300 sec: 42154.1). Total num frames: 422248448. Throughput: 0: 41859.6. Samples: 304401180. Policy #0 lag: (min: 2.0, avg: 21.5, max: 45.0) [2024-03-29 14:58:48,840][00126] Avg episode reward: [(0, '0.393')] [2024-03-29 14:58:51,867][00497] Updated weights for policy 0, policy_version 25781 (0.0020) [2024-03-29 14:58:53,839][00126] Fps is (10 sec: 45874.5, 60 sec: 41779.2, 300 sec: 42209.6). Total num frames: 422477824. Throughput: 0: 41129.2. Samples: 304634600. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 14:58:53,842][00126] Avg episode reward: [(0, '0.331')] [2024-03-29 14:58:56,235][00497] Updated weights for policy 0, policy_version 25791 (0.0023) [2024-03-29 14:58:58,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 422658048. Throughput: 0: 41631.6. Samples: 304901340. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 14:58:58,840][00126] Avg episode reward: [(0, '0.352')] [2024-03-29 14:58:59,940][00497] Updated weights for policy 0, policy_version 25801 (0.0021) [2024-03-29 14:59:03,767][00497] Updated weights for policy 0, policy_version 25811 (0.0017) [2024-03-29 14:59:03,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 422887424. Throughput: 0: 41611.5. Samples: 305028000. Policy #0 lag: (min: 1.0, avg: 20.9, max: 42.0) [2024-03-29 14:59:03,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:59:04,057][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025812_422903808.pth... [2024-03-29 14:59:04,376][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025196_412811264.pth [2024-03-29 14:59:04,852][00476] Signal inference workers to stop experience collection... (10900 times) [2024-03-29 14:59:04,875][00497] InferenceWorker_p0-w0: stopping experience collection (10900 times) [2024-03-29 14:59:05,074][00476] Signal inference workers to resume experience collection... (10900 times) [2024-03-29 14:59:05,074][00497] InferenceWorker_p0-w0: resuming experience collection (10900 times) [2024-03-29 14:59:07,577][00497] Updated weights for policy 0, policy_version 25821 (0.0027) [2024-03-29 14:59:08,839][00126] Fps is (10 sec: 45874.9, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 423116800. Throughput: 0: 41841.8. Samples: 305269500. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:08,840][00126] Avg episode reward: [(0, '0.422')] [2024-03-29 14:59:11,804][00497] Updated weights for policy 0, policy_version 25831 (0.0023) [2024-03-29 14:59:13,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.1, 300 sec: 42209.6). Total num frames: 423313408. Throughput: 0: 41546.5. Samples: 305527280. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:13,841][00126] Avg episode reward: [(0, '0.445')] [2024-03-29 14:59:15,697][00497] Updated weights for policy 0, policy_version 25841 (0.0022) [2024-03-29 14:59:18,839][00126] Fps is (10 sec: 37683.0, 60 sec: 41779.2, 300 sec: 42154.1). Total num frames: 423493632. Throughput: 0: 41516.4. Samples: 305653720. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:18,840][00126] Avg episode reward: [(0, '0.344')] [2024-03-29 14:59:19,426][00497] Updated weights for policy 0, policy_version 25851 (0.0026) [2024-03-29 14:59:23,155][00497] Updated weights for policy 0, policy_version 25861 (0.0032) [2024-03-29 14:59:23,839][00126] Fps is (10 sec: 42599.0, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 423739392. Throughput: 0: 42028.5. Samples: 305900560. Policy #0 lag: (min: 0.0, avg: 21.6, max: 43.0) [2024-03-29 14:59:23,840][00126] Avg episode reward: [(0, '0.518')] [2024-03-29 14:59:27,183][00497] Updated weights for policy 0, policy_version 25871 (0.0018) [2024-03-29 14:59:28,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 423936000. Throughput: 0: 41890.6. Samples: 306157600. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:59:28,840][00126] Avg episode reward: [(0, '0.365')] [2024-03-29 14:59:31,300][00497] Updated weights for policy 0, policy_version 25881 (0.0024) [2024-03-29 14:59:33,839][00126] Fps is (10 sec: 37683.3, 60 sec: 41506.2, 300 sec: 42098.6). Total num frames: 424116224. Throughput: 0: 41930.2. Samples: 306288040. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:59:33,840][00126] Avg episode reward: [(0, '0.434')] [2024-03-29 14:59:34,990][00497] Updated weights for policy 0, policy_version 25891 (0.0029) [2024-03-29 14:59:38,827][00497] Updated weights for policy 0, policy_version 25901 (0.0026) [2024-03-29 14:59:38,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 424361984. Throughput: 0: 42263.3. Samples: 306536440. Policy #0 lag: (min: 0.0, avg: 23.0, max: 43.0) [2024-03-29 14:59:38,840][00126] Avg episode reward: [(0, '0.335')] [2024-03-29 14:59:42,139][00476] Signal inference workers to stop experience collection... (10950 times) [2024-03-29 14:59:42,179][00497] InferenceWorker_p0-w0: stopping experience collection (10950 times) [2024-03-29 14:59:42,225][00476] Signal inference workers to resume experience collection... (10950 times) [2024-03-29 14:59:42,226][00497] InferenceWorker_p0-w0: resuming experience collection (10950 times) [2024-03-29 14:59:42,876][00497] Updated weights for policy 0, policy_version 25911 (0.0035) [2024-03-29 14:59:43,839][00126] Fps is (10 sec: 44236.2, 60 sec: 42325.2, 300 sec: 42098.5). Total num frames: 424558592. Throughput: 0: 41845.6. Samples: 306784400. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:43,840][00126] Avg episode reward: [(0, '0.378')] [2024-03-29 14:59:46,827][00497] Updated weights for policy 0, policy_version 25921 (0.0028) [2024-03-29 14:59:48,839][00126] Fps is (10 sec: 39321.6, 60 sec: 41779.2, 300 sec: 42043.0). Total num frames: 424755200. Throughput: 0: 42111.2. Samples: 306923000. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:48,840][00126] Avg episode reward: [(0, '0.333')] [2024-03-29 14:59:50,658][00497] Updated weights for policy 0, policy_version 25931 (0.0027) [2024-03-29 14:59:53,839][00126] Fps is (10 sec: 42598.4, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 424984576. Throughput: 0: 42152.4. Samples: 307166360. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:53,840][00126] Avg episode reward: [(0, '0.387')] [2024-03-29 14:59:54,581][00497] Updated weights for policy 0, policy_version 25941 (0.0025) [2024-03-29 14:59:58,620][00497] Updated weights for policy 0, policy_version 25951 (0.0020) [2024-03-29 14:59:58,839][00126] Fps is (10 sec: 42597.7, 60 sec: 42052.1, 300 sec: 42043.0). Total num frames: 425181184. Throughput: 0: 41863.5. Samples: 307411140. Policy #0 lag: (min: 0.0, avg: 20.2, max: 41.0) [2024-03-29 14:59:58,840][00126] Avg episode reward: [(0, '0.462')] [2024-03-29 15:00:02,390][00497] Updated weights for policy 0, policy_version 25961 (0.0032) [2024-03-29 15:00:03,839][00126] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 425410560. Throughput: 0: 42112.5. Samples: 307548780. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 15:00:03,840][00126] Avg episode reward: [(0, '0.456')] [2024-03-29 15:00:06,068][00497] Updated weights for policy 0, policy_version 25971 (0.0020) [2024-03-29 15:00:08,839][00126] Fps is (10 sec: 42598.3, 60 sec: 41506.0, 300 sec: 41987.4). Total num frames: 425607168. Throughput: 0: 42162.9. Samples: 307797900. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 15:00:08,840][00126] Avg episode reward: [(0, '0.373')] [2024-03-29 15:00:10,068][00497] Updated weights for policy 0, policy_version 25981 (0.0020) [2024-03-29 15:00:11,079][00476] Signal inference workers to stop experience collection... (11000 times) [2024-03-29 15:00:11,152][00497] InferenceWorker_p0-w0: stopping experience collection (11000 times) [2024-03-29 15:00:11,157][00476] Signal inference workers to resume experience collection... (11000 times) [2024-03-29 15:00:11,180][00497] InferenceWorker_p0-w0: resuming experience collection (11000 times) [2024-03-29 15:00:13,839][00126] Fps is (10 sec: 39321.2, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 425803776. Throughput: 0: 41916.4. Samples: 308043840. Policy #0 lag: (min: 1.0, avg: 21.9, max: 42.0) [2024-03-29 15:00:13,840][00126] Avg episode reward: [(0, '0.418')] [2024-03-29 15:00:14,181][00497] Updated weights for policy 0, policy_version 25991 (0.0024) [2024-03-29 15:00:18,390][00497] Updated weights for policy 0, policy_version 26001 (0.0019) [2024-03-29 15:00:18,839][00126] Fps is (10 sec: 40960.7, 60 sec: 42052.3, 300 sec: 42043.0). Total num frames: 426016768. Throughput: 0: 41746.6. Samples: 308166640. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:18,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:00:22,041][00497] Updated weights for policy 0, policy_version 26011 (0.0029) [2024-03-29 15:00:23,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 426213376. Throughput: 0: 41559.9. Samples: 308406640. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:23,840][00126] Avg episode reward: [(0, '0.511')] [2024-03-29 15:00:26,125][00497] Updated weights for policy 0, policy_version 26021 (0.0027) [2024-03-29 15:00:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 41506.1, 300 sec: 41931.9). Total num frames: 426426368. Throughput: 0: 41581.4. Samples: 308655560. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:28,841][00126] Avg episode reward: [(0, '0.487')] [2024-03-29 15:00:29,950][00497] Updated weights for policy 0, policy_version 26031 (0.0028) [2024-03-29 15:00:33,839][00126] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 426639360. Throughput: 0: 41475.0. Samples: 308789380. Policy #0 lag: (min: 0.0, avg: 19.0, max: 40.0) [2024-03-29 15:00:33,840][00126] Avg episode reward: [(0, '0.449')] [2024-03-29 15:00:34,060][00497] Updated weights for policy 0, policy_version 26041 (0.0019) [2024-03-29 15:00:37,702][00497] Updated weights for policy 0, policy_version 26051 (0.0026) [2024-03-29 15:00:38,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41233.0, 300 sec: 41876.4). Total num frames: 426835968. Throughput: 0: 41842.2. Samples: 309049260. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 15:00:38,840][00126] Avg episode reward: [(0, '0.384')] [2024-03-29 15:00:41,511][00497] Updated weights for policy 0, policy_version 26061 (0.0023) [2024-03-29 15:00:41,512][00476] Signal inference workers to stop experience collection... (11050 times) [2024-03-29 15:00:41,513][00476] Signal inference workers to resume experience collection... (11050 times) [2024-03-29 15:00:41,558][00497] InferenceWorker_p0-w0: stopping experience collection (11050 times) [2024-03-29 15:00:41,558][00497] InferenceWorker_p0-w0: resuming experience collection (11050 times) [2024-03-29 15:00:43,839][00126] Fps is (10 sec: 44237.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 427081728. Throughput: 0: 41786.4. Samples: 309291520. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 15:00:43,840][00126] Avg episode reward: [(0, '0.374')] [2024-03-29 15:00:45,432][00497] Updated weights for policy 0, policy_version 26071 (0.0022) [2024-03-29 15:00:48,839][00126] Fps is (10 sec: 44237.4, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 427278336. Throughput: 0: 41671.6. Samples: 309424000. Policy #0 lag: (min: 0.0, avg: 21.1, max: 43.0) [2024-03-29 15:00:48,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:00:49,338][00497] Updated weights for policy 0, policy_version 26081 (0.0019) [2024-03-29 15:00:53,018][00497] Updated weights for policy 0, policy_version 26091 (0.0025) [2024-03-29 15:00:53,839][00126] Fps is (10 sec: 40959.6, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 427491328. Throughput: 0: 42119.2. Samples: 309693260. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:00:53,840][00126] Avg episode reward: [(0, '0.410')] [2024-03-29 15:00:57,166][00497] Updated weights for policy 0, policy_version 26101 (0.0022) [2024-03-29 15:00:58,839][00126] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 427704320. Throughput: 0: 41663.6. Samples: 309918700. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:00:58,840][00126] Avg episode reward: [(0, '0.359')] [2024-03-29 15:01:01,279][00497] Updated weights for policy 0, policy_version 26111 (0.0020) [2024-03-29 15:01:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 427900928. Throughput: 0: 41928.8. Samples: 310053440. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:01:03,842][00126] Avg episode reward: [(0, '0.364')] [2024-03-29 15:01:03,861][00476] Saving /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000026117_427900928.pth... [2024-03-29 15:01:04,170][00476] Removing /workspace/metta/train_dir/b.a20.20x20_40x40.norm/checkpoint_p0/checkpoint_000025506_417890304.pth [2024-03-29 15:01:05,499][00497] Updated weights for policy 0, policy_version 26121 (0.0023) [2024-03-29 15:01:08,839][00126] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41876.4). Total num frames: 428113920. Throughput: 0: 42181.9. Samples: 310304820. Policy #0 lag: (min: 2.0, avg: 19.8, max: 42.0) [2024-03-29 15:01:08,840][00126] Avg episode reward: [(0, '0.388')] [2024-03-29 15:01:09,024][00497] Updated weights for policy 0, policy_version 26131 (0.0026) [2024-03-29 15:01:12,994][00497] Updated weights for policy 0, policy_version 26141 (0.0018) [2024-03-29 15:01:13,839][00126] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 428326912. Throughput: 0: 41847.4. Samples: 310538700. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 15:01:13,840][00126] Avg episode reward: [(0, '0.360')] [2024-03-29 15:01:16,683][00476] Signal inference workers to stop experience collection... (11100 times) [2024-03-29 15:01:16,684][00476] Signal inference workers to resume experience collection... (11100 times) [2024-03-29 15:01:16,721][00497] InferenceWorker_p0-w0: stopping experience collection (11100 times) [2024-03-29 15:01:16,721][00497] InferenceWorker_p0-w0: resuming experience collection (11100 times) [2024-03-29 15:01:16,995][00497] Updated weights for policy 0, policy_version 26151 (0.0023) [2024-03-29 15:01:18,839][00126] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 428539904. Throughput: 0: 41879.1. Samples: 310673940. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 15:01:18,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:01:21,166][00497] Updated weights for policy 0, policy_version 26161 (0.0020) [2024-03-29 15:01:23,839][00126] Fps is (10 sec: 40961.1, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 428736512. Throughput: 0: 41730.0. Samples: 310927100. Policy #0 lag: (min: 0.0, avg: 20.4, max: 40.0) [2024-03-29 15:01:23,840][00126] Avg episode reward: [(0, '0.375')] [2024-03-29 15:01:24,632][00497] Updated weights for policy 0, policy_version 26171 (0.0024) [2024-03-29 15:01:28,563][00497] Updated weights for policy 0, policy_version 26181 (0.0024) [2024-03-29 15:01:28,839][00126] Fps is (10 sec: 40959.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 428949504. Throughput: 0: 41711.0. Samples: 311168520. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:28,840][00126] Avg episode reward: [(0, '0.299')] [2024-03-29 15:01:32,742][00497] Updated weights for policy 0, policy_version 26191 (0.0023) [2024-03-29 15:01:33,839][00126] Fps is (10 sec: 40959.4, 60 sec: 41779.2, 300 sec: 41765.3). Total num frames: 429146112. Throughput: 0: 41429.3. Samples: 311288320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:33,840][00126] Avg episode reward: [(0, '0.409')] [2024-03-29 15:01:36,841][00497] Updated weights for policy 0, policy_version 26201 (0.0026) [2024-03-29 15:01:38,839][00126] Fps is (10 sec: 40960.5, 60 sec: 42052.4, 300 sec: 41820.9). Total num frames: 429359104. Throughput: 0: 41458.3. Samples: 311558880. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:38,841][00126] Avg episode reward: [(0, '0.414')] [2024-03-29 15:01:40,232][00497] Updated weights for policy 0, policy_version 26211 (0.0029) [2024-03-29 15:01:43,839][00126] Fps is (10 sec: 44236.6, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 429588480. Throughput: 0: 41902.6. Samples: 311804320. Policy #0 lag: (min: 0.0, avg: 20.9, max: 43.0) [2024-03-29 15:01:43,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 15:01:44,161][00497] Updated weights for policy 0, policy_version 26221 (0.0030) [2024-03-29 15:01:48,240][00497] Updated weights for policy 0, policy_version 26231 (0.0021) [2024-03-29 15:01:48,257][00476] Signal inference workers to stop experience collection... (11150 times) [2024-03-29 15:01:48,288][00497] InferenceWorker_p0-w0: stopping experience collection (11150 times) [2024-03-29 15:01:48,480][00476] Signal inference workers to resume experience collection... (11150 times) [2024-03-29 15:01:48,481][00497] InferenceWorker_p0-w0: resuming experience collection (11150 times) [2024-03-29 15:01:48,839][00126] Fps is (10 sec: 44236.7, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 429801472. Throughput: 0: 41622.4. Samples: 311926440. Policy #0 lag: (min: 1.0, avg: 22.4, max: 43.0) [2024-03-29 15:01:48,840][00126] Avg episode reward: [(0, '0.457')] [2024-03-29 15:01:52,281][00497] Updated weights for policy 0, policy_version 26241 (0.0027) [2024-03-29 15:01:53,839][00126] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 429981696. Throughput: 0: 41818.3. Samples: 312186640. Policy #0 lag: (min: 1.0, avg: 22.4, max: 43.0) [2024-03-29 15:01:53,840][00126] Avg episode reward: [(0, '0.383')] [2024-03-29 15:01:55,934][00497] Updated weights for policy 0, policy_version 26251 (0.0020) [2024-03-29 15:01:58,839][00126] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 430194688. Throughput: 0: 42141.0. Samples: 312435040. Policy #0 lag: (min: 1.0, avg: 22.4, max: 43.0) [2024-03-29 15:01:58,840][00126] Avg episode reward: [(0, '0.317')] [2024-03-29 15:01:59,916][00497] Updated weights for policy 0, policy_version 26261 (0.0017) [2024-03-29 15:02:03,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 430391296. Throughput: 0: 41643.2. Samples: 312547880. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:03,840][00126] Avg episode reward: [(0, '0.381')] [2024-03-29 15:02:04,161][00497] Updated weights for policy 0, policy_version 26271 (0.0019) [2024-03-29 15:02:08,297][00497] Updated weights for policy 0, policy_version 26281 (0.0027) [2024-03-29 15:02:08,839][00126] Fps is (10 sec: 40959.8, 60 sec: 41506.1, 300 sec: 41709.8). Total num frames: 430604288. Throughput: 0: 41672.7. Samples: 312802380. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:08,840][00126] Avg episode reward: [(0, '0.343')] [2024-03-29 15:02:11,903][00497] Updated weights for policy 0, policy_version 26291 (0.0021) [2024-03-29 15:02:13,839][00126] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 430817280. Throughput: 0: 41857.4. Samples: 313052100. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:13,840][00126] Avg episode reward: [(0, '0.308')] [2024-03-29 15:02:15,534][00497] Updated weights for policy 0, policy_version 26301 (0.0031) [2024-03-29 15:02:18,839][00126] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 431030272. Throughput: 0: 41964.5. Samples: 313176720. Policy #0 lag: (min: 0.0, avg: 20.8, max: 41.0) [2024-03-29 15:02:18,840][00126] Avg episode reward: [(0, '0.399')] [2024-03-29 15:02:19,641][00497] Updated weights for policy 0, policy_version 26311 (0.0018) [2024-03-29 15:02:23,819][00497] Updated weights for policy 0, policy_version 26321 (0.0026) [2024-03-29 15:02:23,839][00126] Fps is (10 sec: 42598.9, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 431243264. Throughput: 0: 41773.4. Samples: 313438680. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:02:23,840][00126] Avg episode reward: [(0, '0.435')] [2024-03-29 15:02:27,316][00476] Signal inference workers to stop experience collection... (11200 times) [2024-03-29 15:02:27,397][00476] Signal inference workers to resume experience collection... (11200 times) [2024-03-29 15:02:27,400][00497] InferenceWorker_p0-w0: stopping experience collection (11200 times) [2024-03-29 15:02:27,403][00497] Updated weights for policy 0, policy_version 26331 (0.0021) [2024-03-29 15:02:27,423][00497] InferenceWorker_p0-w0: resuming experience collection (11200 times) [2024-03-29 15:02:28,839][00126] Fps is (10 sec: 42598.5, 60 sec: 41779.3, 300 sec: 41765.3). Total num frames: 431456256. Throughput: 0: 41905.9. Samples: 313690080. Policy #0 lag: (min: 0.0, avg: 21.4, max: 42.0) [2024-03-29 15:02:28,840][00126] Avg episode reward: [(0, '0.349')]