Tobias Reiling
The PPO rl-model for the LunarLander environment from unit 1 of the deep rl course
7eefd2d
{"mean_reward": 264.72683040100566, "std_reward": 30.17505712959752, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2023-02-21T20:39:57.047805"} |