PPO Agent Playing LunarLander-v2

This model was trained locally with a CleanRL-style single-file PPO implementation for Hugging Face Deep RL Course Unit 8.

Results

  • Mean reward: -180.47 +/- 74.10
  • Evaluation episodes: 10
  • Timesteps: 50000

Files

  • model.pt: PyTorch policy checkpoint.
  • results.json: evaluation results.
  • replay.mp4: rendered policy preview.
  • logs/: TensorBoard logs from the training run.
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results