PPO Agent Playing LunarLander-v3

This is a trained model of a PPO agent playing LunarLander-v3.

Hyperparameters

exp_name: ppo
seed: 1
torch_deterministic: True
cuda: True
track: False
wandb_project_name: cleanRL
wandb_entity: None
capture_video: False
env_id: LunarLander-v3
total_timesteps: 300000
learning_rate: 0.00025
num_envs: 4
num_steps: 128
anneal_lr: True
gae: True
gamma: 0.99
gae_lambda: 0.95
num_minibatches: 4
update_epochs: 4
norm_adv: True
clip_coef: 0.2
clip_vloss: True
ent_coef: 0.01
vf_coef: 0.5
max_grad_norm: 0.5
target_kl: None
repo_id: cjksofm/ppo-LunarLander-v3
batch_size: 512
minibatch_size: 128

Downloads last month: 53

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

-27.99 +/- 27.64