PPO Agent Playing CartPole-v1
This is a trained model of a PPO agent playing CartPole-v1.
Hyperparameters
exp_name: ppo_cartpole
seed: 1
torch_deterministic: True
cuda: True
track: False
wandb_project_name: cleanRL
wandb_entity: None
capture_video: False
env_id: CartPole-v1
total_timesteps: 50000
learning_rate: 0.00025
num_envs: 4
num_steps: 128
anneal_lr: True
gae: True
gamma: 0.99
gae_lambda: 0.95
num_minibatches: 4
update_epochs: 4
norm_adv: True
clip_coef: 0.2
clip_vloss: True
ent_coef: 0.01
vf_coef: 0.5
max_grad_norm: 0.5
target_kl: None
repo_id: cjksofm/ppo-CartPole-v1
batch_size: 512
minibatch_size: 128
Evaluation results
- mean_reward on CartPole-v1self-reported234.60 +/- 95.29