A CleanRL-style PPO agent trained from scratch on LunarLander-v2.
Numbers are auto-generated from results.json so the card and results.json always match.
-