jnforja
/

ppo-LunarLander-v2-cleanrl

Reinforcement Learning

deep-reinforcement-learning

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

Metrics Training metrics Community

PPO Agent Playing LunarLander-v2

This model was trained locally with a CleanRL-style single-file PPO implementation for Hugging Face Deep RL Course Unit 8.

Results

Mean reward: -180.47 +/- 74.10
Evaluation episodes: 10
Timesteps: 50000

Files

model.pt: PyTorch policy checkpoint.
results.json: evaluation results.
replay.mp4: rendered policy preview.
logs/: TensorBoard logs from the training run.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

loading

Evaluation results

mean_reward on LunarLander-v2
self-reported

-180.47 +/- 74.10