PPO Agent Playing LunarLander-v2

This is a trained model of a PPO agent playing LunarLander-v2.

Hyperparameters

Namespace(exp_name='ppo', seed=1, torch_deterministic=True, cuda=True, track=False, wandb_project_name='cleanRL', wandb_entity=None, capture_video=False, env_id='LunarLander-v2', total_timesteps=500000, learning_rate=0.00025, num_envs=4, num_steps=128, anneal_lr=True, gae=True, gamma=0.99, gae_lambda=0.95, num_minibatches=4, update_epochs=4, norm_adv=True, clip_coef=0.2, clip_vloss=True, ent_coef=0.01, vf_coef=0.5, max_grad_norm=0.5, target_kl=None, repo_id='MadheshBS/ppo-LunarLander-v2', batch_size=512, minibatch_size=128)

Downloads last month: 6

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

53.48 +/- 104.60