PPO Agent Playing LunarLander-v2
This model was trained locally with a CleanRL-style single-file PPO implementation for Hugging Face Deep RL Course Unit 8.
Results
- Mean reward:
-180.47 +/- 74.10 - Evaluation episodes:
10 - Timesteps:
50000
Files
model.pt: PyTorch policy checkpoint.results.json: evaluation results.replay.mp4: rendered policy preview.logs/: TensorBoard logs from the training run.
Evaluation results
- mean_reward on LunarLander-v2self-reported-180.47 +/- 74.10