ppo-lunarlander-v2 / results.json
bartpotrykus's picture
30m training steps with linear learning rate scheduler applied after 15m steps
6881971
{"mean_reward": 302.96265780000004, "std_reward": 12.170855501809033, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2022-12-21T18:20:27.608387"}