[2022-12-04 20:47:56,451][04266] Saving configuration to /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/config.json... [2022-12-04 20:47:56,464][04266] Rollout worker 0 uses device cpu [2022-12-04 20:47:56,464][04266] Rollout worker 1 uses device cpu [2022-12-04 20:47:56,464][04266] Rollout worker 2 uses device cpu [2022-12-04 20:47:56,465][04266] Rollout worker 3 uses device cpu [2022-12-04 20:47:56,465][04266] Rollout worker 4 uses device cpu [2022-12-04 20:47:56,465][04266] Rollout worker 5 uses device cpu [2022-12-04 20:47:56,465][04266] Rollout worker 6 uses device cpu [2022-12-04 20:47:56,465][04266] Rollout worker 7 uses device cpu [2022-12-04 20:47:56,465][04266] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2022-12-04 20:47:56,487][04266] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2022-12-04 20:47:56,487][04266] InferenceWorker_p0-w0: min num requests: 2 [2022-12-04 20:47:56,519][04266] Starting all processes... [2022-12-04 20:47:56,520][04266] Starting process learner_proc0 [2022-12-04 20:47:56,570][04266] Starting all processes... [2022-12-04 20:47:56,577][04266] Starting process inference_proc0-0 [2022-12-04 20:47:56,577][04266] Starting process rollout_proc0 [2022-12-04 20:47:56,578][04266] Starting process rollout_proc1 [2022-12-04 20:47:56,578][04266] Starting process rollout_proc2 [2022-12-04 20:47:56,578][04266] Starting process rollout_proc3 [2022-12-04 20:47:56,579][04266] Starting process rollout_proc4 [2022-12-04 20:47:56,579][04266] Starting process rollout_proc5 [2022-12-04 20:47:56,584][04266] Starting process rollout_proc6 [2022-12-04 20:47:56,591][04266] Starting process rollout_proc7 [2022-12-04 20:47:58,489][04366] Worker 5 uses CPU cores [5] [2022-12-04 20:47:58,561][04361] Worker 0 uses CPU cores [0] [2022-12-04 20:47:58,611][04360] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2022-12-04 20:47:58,612][04360] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2022-12-04 20:47:58,705][04367] Worker 4 uses CPU cores [4] [2022-12-04 20:47:58,733][04363] Worker 6 uses CPU cores [6] [2022-12-04 20:47:58,765][04368] Worker 2 uses CPU cores [2] [2022-12-04 20:47:58,779][04365] Worker 3 uses CPU cores [3] [2022-12-04 20:47:58,824][04340] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2022-12-04 20:47:58,825][04340] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2022-12-04 20:47:58,834][04364] Worker 7 uses CPU cores [7] [2022-12-04 20:47:58,885][04362] Worker 1 uses CPU cores [1] [2022-12-04 20:47:59,427][04360] Num visible devices: 1 [2022-12-04 20:47:59,428][04340] Num visible devices: 1 [2022-12-04 20:47:59,446][04340] Starting seed is not provided [2022-12-04 20:47:59,446][04340] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2022-12-04 20:47:59,446][04340] Initializing actor-critic model on device cuda:0 [2022-12-04 20:47:59,446][04340] RunningMeanStd input shape: (27,) [2022-12-04 20:47:59,447][04340] RunningMeanStd input shape: (1,) [2022-12-04 20:47:59,522][04340] Created Actor Critic model with architecture: [2022-12-04 20:47:59,522][04340] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=8, bias=True) ) ) [2022-12-04 20:48:03,416][04340] Using optimizer [2022-12-04 20:48:03,417][04340] No checkpoints found [2022-12-04 20:48:03,417][04340] Did not load from checkpoint, starting from scratch! [2022-12-04 20:48:03,417][04340] Initialized policy 0 weights for model version 0 [2022-12-04 20:48:03,422][04340] LearnerWorker_p0 finished initialization! [2022-12-04 20:48:03,424][04340] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2022-12-04 20:48:03,551][04360] RunningMeanStd input shape: (27,) [2022-12-04 20:48:03,552][04360] RunningMeanStd input shape: (1,) [2022-12-04 20:48:03,650][04266] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2022-12-04 20:48:07,105][04266] Inference worker 0-0 is ready! [2022-12-04 20:48:07,105][04266] All inference workers are ready! Signal rollout workers to start! [2022-12-04 20:48:07,303][04364] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,303][04362] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,305][04363] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,306][04362] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,305][04367] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,305][04364] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,305][04361] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,305][04368] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,306][04366] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,307][04367] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,307][04363] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,307][04365] Decorrelating experience for 0 frames... [2022-12-04 20:48:07,308][04368] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,308][04366] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,308][04361] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,309][04365] Decorrelating experience for 64 frames... [2022-12-04 20:48:07,359][04364] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,360][04363] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,362][04366] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,361][04362] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,362][04361] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,362][04365] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,362][04367] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,362][04368] Decorrelating experience for 128 frames... [2022-12-04 20:48:07,467][04363] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,467][04364] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,469][04367] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,469][04365] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,470][04366] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,471][04361] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,472][04362] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,474][04368] Decorrelating experience for 192 frames... [2022-12-04 20:48:07,650][04364] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,658][04363] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,658][04365] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,659][04367] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,659][04362] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,661][04361] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,662][04366] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,664][04368] Decorrelating experience for 256 frames... [2022-12-04 20:48:07,856][04364] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,863][04363] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,864][04365] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,866][04362] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,866][04361] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,871][04366] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,872][04367] Decorrelating experience for 320 frames... [2022-12-04 20:48:07,877][04368] Decorrelating experience for 320 frames... [2022-12-04 20:48:08,114][04364] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,119][04363] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,121][04365] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,123][04361] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,128][04362] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,129][04366] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,131][04367] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,144][04368] Decorrelating experience for 384 frames... [2022-12-04 20:48:08,431][04364] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,433][04363] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,437][04365] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,437][04361] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,440][04362] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,444][04367] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,452][04366] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,466][04368] Decorrelating experience for 448 frames... [2022-12-04 20:48:08,650][04266] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2022-12-04 20:48:08,652][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000000_0.pth... [2022-12-04 20:48:13,650][04266] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8192. Throughput: 0: 846.4. Samples: 8464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:13,650][04266] Avg episode reward: [(0, '-160.026')] [2022-12-04 20:48:16,478][04266] Heartbeat connected on Batcher_0 [2022-12-04 20:48:16,482][04266] Heartbeat connected on LearnerWorker_p0 [2022-12-04 20:48:16,492][04266] Heartbeat connected on InferenceWorker_p0-w0 [2022-12-04 20:48:16,493][04266] Heartbeat connected on RolloutWorker_w0 [2022-12-04 20:48:16,503][04266] Heartbeat connected on RolloutWorker_w2 [2022-12-04 20:48:16,503][04266] Heartbeat connected on RolloutWorker_w1 [2022-12-04 20:48:16,510][04266] Heartbeat connected on RolloutWorker_w4 [2022-12-04 20:48:16,511][04266] Heartbeat connected on RolloutWorker_w3 [2022-12-04 20:48:16,516][04266] Heartbeat connected on RolloutWorker_w5 [2022-12-04 20:48:16,521][04266] Heartbeat connected on RolloutWorker_w6 [2022-12-04 20:48:16,529][04266] Heartbeat connected on RolloutWorker_w7 [2022-12-04 20:48:18,650][04266] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 36864. Throughput: 0: 1698.1. Samples: 25472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:18,651][04266] Avg episode reward: [(0, '-169.308')] [2022-12-04 20:48:18,924][04360] Updated weights for policy 0, policy_version 80 (0.0006) [2022-12-04 20:48:23,650][04266] Fps is (10 sec: 5734.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 65536. Throughput: 0: 2930.0. Samples: 58600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:23,651][04266] Avg episode reward: [(0, '-249.723')] [2022-12-04 20:48:23,656][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000128_65536.pth... [2022-12-04 20:48:26,260][04360] Updated weights for policy 0, policy_version 160 (0.0007) [2022-12-04 20:48:28,650][04266] Fps is (10 sec: 5734.4, 60 sec: 3768.3, 300 sec: 3768.3). Total num frames: 94208. Throughput: 0: 3705.3. Samples: 92632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:28,651][04266] Avg episode reward: [(0, '-89.994')] [2022-12-04 20:48:33,559][04360] Updated weights for policy 0, policy_version 240 (0.0006) [2022-12-04 20:48:33,650][04266] Fps is (10 sec: 5734.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 122880. Throughput: 0: 3641.5. Samples: 109244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:33,651][04266] Avg episode reward: [(0, '-153.751')] [2022-12-04 20:48:33,651][04340] Saving new best policy, reward=-153.751! [2022-12-04 20:48:38,650][04266] Fps is (10 sec: 5324.8, 60 sec: 4213.0, 300 sec: 4213.0). Total num frames: 147456. Throughput: 0: 4093.4. Samples: 143268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:38,650][04266] Avg episode reward: [(0, '-137.350')] [2022-12-04 20:48:38,669][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000296_151552.pth... [2022-12-04 20:48:38,675][04340] Removing /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000000_0.pth [2022-12-04 20:48:38,675][04340] Saving new best policy, reward=-137.350! [2022-12-04 20:48:40,889][04360] Updated weights for policy 0, policy_version 320 (0.0006) [2022-12-04 20:48:43,650][04266] Fps is (10 sec: 5324.8, 60 sec: 4403.2, 300 sec: 4403.2). Total num frames: 176128. Throughput: 0: 4415.1. Samples: 176604. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:43,651][04266] Avg episode reward: [(0, '-69.206')] [2022-12-04 20:48:43,651][04340] Saving new best policy, reward=-69.206! [2022-12-04 20:48:48,177][04360] Updated weights for policy 0, policy_version 400 (0.0006) [2022-12-04 20:48:48,650][04266] Fps is (10 sec: 5734.4, 60 sec: 4551.1, 300 sec: 4551.1). Total num frames: 204800. Throughput: 0: 4290.7. Samples: 193080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:48,651][04266] Avg episode reward: [(0, '-52.726')] [2022-12-04 20:48:48,651][04340] Saving new best policy, reward=-52.726! [2022-12-04 20:48:53,650][04266] Fps is (10 sec: 5734.4, 60 sec: 4669.5, 300 sec: 4669.5). Total num frames: 233472. Throughput: 0: 5054.2. Samples: 227440. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2022-12-04 20:48:53,650][04266] Avg episode reward: [(0, '-33.694')] [2022-12-04 20:48:53,657][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000456_233472.pth... [2022-12-04 20:48:53,664][04340] Removing /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000128_65536.pth [2022-12-04 20:48:53,664][04340] Saving new best policy, reward=-33.694! [2022-12-04 20:48:55,518][04360] Updated weights for policy 0, policy_version 480 (0.0006) [2022-12-04 20:48:58,650][04266] Fps is (10 sec: 5734.4, 60 sec: 4766.3, 300 sec: 4766.3). Total num frames: 262144. Throughput: 0: 5586.5. Samples: 259856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:48:58,651][04266] Avg episode reward: [(0, '-45.611')] [2022-12-04 20:49:03,653][04266] Fps is (10 sec: 4913.5, 60 sec: 4710.1, 300 sec: 4710.1). Total num frames: 282624. Throughput: 0: 5596.9. Samples: 277352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:03,654][04266] Avg episode reward: [(0, '-29.953')] [2022-12-04 20:49:03,655][04340] Saving new best policy, reward=-29.953! [2022-12-04 20:49:04,937][04360] Updated weights for policy 0, policy_version 560 (0.0008) [2022-12-04 20:49:08,650][04266] Fps is (10 sec: 4096.0, 60 sec: 5051.7, 300 sec: 4663.1). Total num frames: 303104. Throughput: 0: 5336.0. Samples: 298720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:08,650][04266] Avg episode reward: [(0, '-29.014')] [2022-12-04 20:49:08,678][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000600_307200.pth... [2022-12-04 20:49:08,686][04340] Removing /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000296_151552.pth [2022-12-04 20:49:08,686][04340] Saving new best policy, reward=-29.014! [2022-12-04 20:49:12,321][04360] Updated weights for policy 0, policy_version 640 (0.0007) [2022-12-04 20:49:13,650][04266] Fps is (10 sec: 4916.9, 60 sec: 5393.1, 300 sec: 4739.7). Total num frames: 331776. Throughput: 0: 5326.1. Samples: 332308. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2022-12-04 20:49:13,650][04266] Avg episode reward: [(0, '-0.035')] [2022-12-04 20:49:13,651][04340] Saving new best policy, reward=-0.035! [2022-12-04 20:49:18,650][04266] Fps is (10 sec: 5734.4, 60 sec: 5393.1, 300 sec: 4806.0). Total num frames: 360448. Throughput: 0: 5338.0. Samples: 349452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:18,650][04266] Avg episode reward: [(0, '26.827')] [2022-12-04 20:49:18,651][04340] Saving new best policy, reward=26.827! [2022-12-04 20:49:19,490][04360] Updated weights for policy 0, policy_version 720 (0.0006) [2022-12-04 20:49:23,650][04266] Fps is (10 sec: 5734.3, 60 sec: 5393.1, 300 sec: 4864.0). Total num frames: 389120. Throughput: 0: 5356.0. Samples: 384288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:23,651][04266] Avg episode reward: [(0, '75.358')] [2022-12-04 20:49:23,656][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000760_389120.pth... [2022-12-04 20:49:23,665][04340] Removing /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000456_233472.pth [2022-12-04 20:49:23,665][04340] Saving new best policy, reward=75.358! [2022-12-04 20:49:26,586][04360] Updated weights for policy 0, policy_version 800 (0.0006) [2022-12-04 20:49:28,650][04266] Fps is (10 sec: 5734.4, 60 sec: 5393.1, 300 sec: 4915.2). Total num frames: 417792. Throughput: 0: 5375.7. Samples: 418512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:28,650][04266] Avg episode reward: [(0, '153.991')] [2022-12-04 20:49:28,651][04340] Saving new best policy, reward=153.991! [2022-12-04 20:49:33,650][04266] Fps is (10 sec: 5734.5, 60 sec: 5393.1, 300 sec: 4960.7). Total num frames: 446464. Throughput: 0: 5396.6. Samples: 435928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:33,650][04266] Avg episode reward: [(0, '231.230')] [2022-12-04 20:49:33,671][04340] Saving new best policy, reward=231.230! [2022-12-04 20:49:33,672][04360] Updated weights for policy 0, policy_version 880 (0.0006) [2022-12-04 20:49:38,650][04266] Fps is (10 sec: 5734.3, 60 sec: 5461.3, 300 sec: 5001.4). Total num frames: 475136. Throughput: 0: 5398.1. Samples: 470356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2022-12-04 20:49:38,651][04266] Avg episode reward: [(0, '321.313')] [2022-12-04 20:49:38,656][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000928_475136.pth... [2022-12-04 20:49:38,664][04340] Removing /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000600_307200.pth [2022-12-04 20:49:38,665][04340] Saving new best policy, reward=321.313! [2022-12-04 20:49:40,419][04266] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 4266], exiting... [2022-12-04 20:49:40,420][04266] Runner profile tree view: main_loop: 103.9009 [2022-12-04 20:49:40,421][04266] Collected {0: 487424}, FPS: 4691.2 [2022-12-04 20:49:40,421][04340] Stopping Batcher_0... [2022-12-04 20:49:40,421][04340] Loop batcher_evt_loop terminating... [2022-12-04 20:49:40,421][04365] Stopping RolloutWorker_w3... [2022-12-04 20:49:40,422][04365] Loop rollout_proc3_evt_loop terminating... [2022-12-04 20:49:40,422][04340] Saving /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000952_487424.pth... [2022-12-04 20:49:40,424][04366] Stopping RolloutWorker_w5... [2022-12-04 20:49:40,424][04366] Loop rollout_proc5_evt_loop terminating... [2022-12-04 20:49:40,425][04361] Stopping RolloutWorker_w0... [2022-12-04 20:49:40,425][04362] Stopping RolloutWorker_w1... [2022-12-04 20:49:40,426][04363] Stopping RolloutWorker_w6... [2022-12-04 20:49:40,426][04361] Loop rollout_proc0_evt_loop terminating... [2022-12-04 20:49:40,426][04362] Loop rollout_proc1_evt_loop terminating... [2022-12-04 20:49:40,426][04368] Stopping RolloutWorker_w2... [2022-12-04 20:49:40,426][04363] Loop rollout_proc6_evt_loop terminating... [2022-12-04 20:49:40,426][04368] Loop rollout_proc2_evt_loop terminating... [2022-12-04 20:49:40,429][04340] Removing /home/andrew_huggingface_co/sample-factory/train_dir/ant_test/checkpoint_p0/checkpoint_000000760_389120.pth [2022-12-04 20:49:40,429][04340] Stopping LearnerWorker_p0... [2022-12-04 20:49:40,430][04340] Loop learner_proc0_evt_loop terminating... [2022-12-04 20:49:40,436][04360] Weights refcount: 2 0 [2022-12-04 20:49:40,437][04360] Stopping InferenceWorker_p0-w0... [2022-12-04 20:49:40,438][04360] Loop inference_proc0-0_evt_loop terminating... [2022-12-04 20:49:40,474][04364] Stopping RolloutWorker_w7... [2022-12-04 20:49:40,475][04364] Loop rollout_proc7_evt_loop terminating... [2022-12-04 20:49:40,498][04367] Stopping RolloutWorker_w4... [2022-12-04 20:49:40,521][04367] Loop rollout_proc4_evt_loop terminating...