ppo-LunarLander-v2 / replay.mp4

Commit History

600k steps trained. mean_reward= 209.20 +/- 39.6
5f13406

kws commited on