lunar_lander_v2-ppo / results.json
jjpp3301's picture
first trained agent with proximal policy optimization
ca4b3e9
{"mean_reward": 286.12453207862944, "std_reward": 17.950547966482564, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2022-12-12T02:05:07.148364"}