| 2025-03-29 14:27:03 | [rl2_trainer] Logging to /home/h2khalil/MetaRL-Assistive-Robotics/data/local/experiment/rl2_trainer |
| 2025-03-29 14:27:14 | [rl2_trainer] Obtaining samples... |
| 2025-03-29 14:31:58 | [rl2_trainer] epoch #0 | Optimizing policy... |
| 2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Fitting baseline... |
| 2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Computing loss before |
| 2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Computing KL before |
| 2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Optimizing |
| 2025-03-29 14:32:11 | [rl2_trainer] epoch #0 | Computing KL after |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Computing loss after |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saving snapshot... |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saved |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Time 298.18 s |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | EpochTime 298.18 s |
| ---------------------------------------- ------------ |
| Average/AverageDiscountedReturn -42.9028 |
| Average/AverageReturn -69.0759 |
| Average/Iteration 0 |
| Average/MaxReturn 5.14373 |
| Average/MinReturn -121.89 |
| Average/NumEpisodes 40 |
| Average/StdReturn 26.7746 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.814994 |
| TotalEnvSteps 4000 |
| __unnamed_task__/AverageDiscountedReturn -42.9028 |
| __unnamed_task__/AverageReturn -69.0759 |
| __unnamed_task__/Iteration 0 |
| __unnamed_task__/MaxReturn 5.14373 |
| __unnamed_task__/MinReturn -121.89 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 26.7746 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.91254 |
| policy/KL 0.0179773 |
| policy/KLBefore 0 |
| policy/LossAfter -0.172905 |
| policy/LossBefore 0.0100782 |
| policy/dLoss 0.182983 |
| ---------------------------------------- ------------ |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing policy... |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Fitting baseline... |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing loss before |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing KL before |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing KL after |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing loss after |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saving snapshot... |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saved |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Time 597.11 s |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | EpochTime 298.93 s |
| ---------------------------------------- ------------ |
| Average/AverageDiscountedReturn -46.6949 |
| Average/AverageReturn -74.2172 |
| Average/Iteration 1 |
| Average/MaxReturn -35.2002 |
| Average/MinReturn -127.671 |
| Average/NumEpisodes 40 |
| Average/StdReturn 23.4651 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.887116 |
| TotalEnvSteps 8000 |
| __unnamed_task__/AverageDiscountedReturn -46.6949 |
| __unnamed_task__/AverageReturn -74.2172 |
| __unnamed_task__/Iteration 1 |
| __unnamed_task__/MaxReturn -35.2002 |
| __unnamed_task__/MinReturn -127.671 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 23.4651 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.90552 |
| policy/KL 0.0104231 |
| policy/KLBefore 0 |
| policy/LossAfter -0.108461 |
| policy/LossBefore 0.0091655 |
| policy/dLoss 0.117626 |
| ---------------------------------------- ------------ |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing policy... |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Fitting baseline... |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing loss before |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing KL before |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing KL after |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing loss after |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saving snapshot... |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saved |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Time 938.81 s |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | EpochTime 341.68 s |
| ---------------------------------------- -------------- |
| Average/AverageDiscountedReturn -45.4614 |
| Average/AverageReturn -72.7992 |
| Average/Iteration 2 |
| Average/MaxReturn -26.0289 |
| Average/MinReturn -137.031 |
| Average/NumEpisodes 40 |
| Average/StdReturn 26.9881 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.840131 |
| TotalEnvSteps 12000 |
| __unnamed_task__/AverageDiscountedReturn -45.4614 |
| __unnamed_task__/AverageReturn -72.7992 |
| __unnamed_task__/Iteration 2 |
| __unnamed_task__/MaxReturn -26.0289 |
| __unnamed_task__/MinReturn -137.031 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 26.9881 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.88918 |
| policy/KL 0.00923636 |
| policy/KLBefore 0 |
| policy/LossAfter -0.140978 |
| policy/LossBefore -0.0310702 |
| policy/dLoss 0.109907 |
| ---------------------------------------- -------------- |
| 2025-03-29 14:44:19 | [rl2_trainer] epoch #3 | Optimizing policy... |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Fitting baseline... |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing loss before |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing KL before |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Optimizing |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing KL after |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing loss after |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saving snapshot... |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saved |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Time 1027.97 s |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | EpochTime 89.16 s |
| ---------------------------------------- ------------- |
| Average/AverageDiscountedReturn -42.7249 |
| Average/AverageReturn -68.2275 |
| Average/Iteration 3 |
| Average/MaxReturn -35.9495 |
| Average/MinReturn -119.74 |
| Average/NumEpisodes 40 |
| Average/StdReturn 22.0106 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.895101 |
| TotalEnvSteps 16000 |
| __unnamed_task__/AverageDiscountedReturn -42.7249 |
| __unnamed_task__/AverageReturn -68.2275 |
| __unnamed_task__/Iteration 3 |
| __unnamed_task__/MaxReturn -35.9495 |
| __unnamed_task__/MinReturn -119.74 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 22.0106 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.85707 |
| policy/KL 0.0100265 |
| policy/KLBefore 0 |
| policy/LossAfter -0.130342 |
| policy/LossBefore -0.0353351 |
| policy/dLoss 0.0950072 |
| ---------------------------------------- ------------- |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing policy... |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Fitting baseline... |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing loss before |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing KL before |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing KL after |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing loss after |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saving snapshot... |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saved |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Time 1122.14 s |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | EpochTime 94.17 s |
| ---------------------------------------- -------------- |
| Average/AverageDiscountedReturn -41.9613 |
| Average/AverageReturn -66.2673 |
| Average/Iteration 4 |
| Average/MaxReturn -33.9462 |
| Average/MinReturn -121.742 |
| Average/NumEpisodes 40 |
| Average/StdReturn 24.5891 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.909156 |
| TotalEnvSteps 20000 |
| __unnamed_task__/AverageDiscountedReturn -41.9613 |
| __unnamed_task__/AverageReturn -66.2673 |
| __unnamed_task__/Iteration 4 |
| __unnamed_task__/MaxReturn -33.9462 |
| __unnamed_task__/MinReturn -121.742 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 24.5891 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.81839 |
| policy/KL 0.0102138 |
| policy/KLBefore 0 |
| policy/LossAfter -0.0962488 |
| policy/LossBefore 0.00132629 |
| policy/dLoss 0.0975751 |
| ---------------------------------------- -------------- |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing policy... |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Fitting baseline... |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing loss before |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing KL before |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing KL after |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing loss after |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saving snapshot... |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saved |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Time 1251.93 s |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | EpochTime 129.78 s |
| ---------------------------------------- ------------- |
| Average/AverageDiscountedReturn -38.2055 |
| Average/AverageReturn -61.7326 |
| Average/Iteration 5 |
| Average/MaxReturn 134.172 |
| Average/MinReturn -125.595 |
| Average/NumEpisodes 40 |
| Average/StdReturn 42.322 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.652002 |
| TotalEnvSteps 24000 |
| __unnamed_task__/AverageDiscountedReturn -38.2055 |
| __unnamed_task__/AverageReturn -61.7326 |
| __unnamed_task__/Iteration 5 |
| __unnamed_task__/MaxReturn 134.172 |
| __unnamed_task__/MinReturn -125.595 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 42.322 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.80804 |
| policy/KL 0.0122716 |
| policy/KLBefore 0 |
| policy/LossAfter -0.204539 |
| policy/LossBefore 0.0500677 |
| policy/dLoss 0.254606 |
| ---------------------------------------- ------------- |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing policy... |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Fitting baseline... |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing loss before |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing KL before |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing |
| 2025-03-29 14:51:35 | [rl2_trainer] epoch #6 | Computing KL after |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Computing loss after |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saving snapshot... |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saved |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Time 1461.75 s |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | EpochTime 209.82 s |
| ---------------------------------------- ------------- |
| Average/AverageDiscountedReturn -42.1921 |
| Average/AverageReturn -67.1612 |
| Average/Iteration 6 |
| Average/MaxReturn -33.1935 |
| Average/MinReturn -110.057 |
| Average/NumEpisodes 40 |
| Average/StdReturn 24.1351 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.848234 |
| TotalEnvSteps 28000 |
| __unnamed_task__/AverageDiscountedReturn -42.1921 |
| __unnamed_task__/AverageReturn -67.1612 |
| __unnamed_task__/Iteration 6 |
| __unnamed_task__/MaxReturn -33.1935 |
| __unnamed_task__/MinReturn -110.057 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 24.1351 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.80043 |
| policy/KL 0.014637 |
| policy/KLBefore 0 |
| policy/LossAfter -0.114569 |
| policy/LossBefore -0.0141929 |
| policy/dLoss 0.100376 |
| ---------------------------------------- ------------- |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing policy... |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Fitting baseline... |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing loss before |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing KL before |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing KL after |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing loss after |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saving snapshot... |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saved |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Time 1701.84 s |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | EpochTime 240.09 s |
| ---------------------------------------- ------------- |
| Average/AverageDiscountedReturn -42.4082 |
| Average/AverageReturn -67.878 |
| Average/Iteration 7 |
| Average/MaxReturn -34.1169 |
| Average/MinReturn -111.115 |
| Average/NumEpisodes 40 |
| Average/StdReturn 19.5859 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.865991 |
| TotalEnvSteps 32000 |
| __unnamed_task__/AverageDiscountedReturn -42.4082 |
| __unnamed_task__/AverageReturn -67.878 |
| __unnamed_task__/Iteration 7 |
| __unnamed_task__/MaxReturn -34.1169 |
| __unnamed_task__/MinReturn -111.115 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 19.5859 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.79624 |
| policy/KL 0.0104825 |
| policy/KLBefore 0 |
| policy/LossAfter -0.13989 |
| policy/LossBefore -0.0309541 |
| policy/dLoss 0.108936 |
| ---------------------------------------- ------------- |
| 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Optimizing policy... |
| 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Fitting baseline... |
| 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Computing loss before |
| 2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Computing KL before |
| 2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Optimizing |
| 2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing KL after |
| 2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing loss after |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saving snapshot... |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saved |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Time 1945.55 s |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | EpochTime 243.70 s |
| ---------------------------------------- ------------- |
| Average/AverageDiscountedReturn -39.7762 |
| Average/AverageReturn -63.9139 |
| Average/Iteration 8 |
| Average/MaxReturn -35.6858 |
| Average/MinReturn -110.7 |
| Average/NumEpisodes 40 |
| Average/StdReturn 20.7657 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.906608 |
| TotalEnvSteps 36000 |
| __unnamed_task__/AverageDiscountedReturn -39.7762 |
| __unnamed_task__/AverageReturn -63.9139 |
| __unnamed_task__/Iteration 8 |
| __unnamed_task__/MaxReturn -35.6858 |
| __unnamed_task__/MinReturn -110.7 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 20.7657 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.78585 |
| policy/KL 0.0106836 |
| policy/KLBefore 0 |
| policy/LossAfter -0.0940088 |
| policy/LossBefore -0.0208258 |
| policy/dLoss 0.073183 |
| ---------------------------------------- ------------- |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Optimizing policy... |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Fitting baseline... |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing loss before |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing KL before |
| 2025-03-29 15:03:43 | [rl2_trainer] epoch #9 | Optimizing |
| 2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing KL after |
| 2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing loss after |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saving snapshot... |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saved |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Time 2197.58 s |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | EpochTime 252.03 s |
| ---------------------------------------- -------------- |
| Average/AverageDiscountedReturn -38.8162 |
| Average/AverageReturn -61.6066 |
| Average/Iteration 9 |
| Average/MaxReturn -11.7124 |
| Average/MinReturn -113.375 |
| Average/NumEpisodes 40 |
| Average/StdReturn 21.625 |
| Average/TerminationRate 0 |
| LinearFeatureBaseline/ExplainedVariance 0.827891 |
| TotalEnvSteps 40000 |
| __unnamed_task__/AverageDiscountedReturn -38.8162 |
| __unnamed_task__/AverageReturn -61.6066 |
| __unnamed_task__/Iteration 9 |
| __unnamed_task__/MaxReturn -11.7124 |
| __unnamed_task__/MinReturn -113.375 |
| __unnamed_task__/NumEpisodes 40 |
| __unnamed_task__/StdReturn 21.625 |
| __unnamed_task__/TerminationRate 0 |
| policy/Entropy 9.77166 |
| policy/KL 0.00887517 |
| policy/KLBefore 0 |
| policy/LossAfter -0.146794 |
| policy/LossBefore -0.021343 |
| policy/dLoss 0.125451 |
| ---------------------------------------- -------------- |
|
|