lunar lander model #4, using PPO trained with learning rate 0.0005 for 500K timesteps 0e6fc9b bguan commited on May 9, 2022