nirmanpatel
/

Reinforce-PixelCopter

@@ -15,13 +15,94 @@ model-index:
       name: Pixelcopter-PLE-v0
       type: Pixelcopter-PLE-v0
     metrics:
-    - type: mean_reward
-      value: 67.30 +/- 55.17
-      name: mean_reward
-      verified: false
 ---
-  # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
-  This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
-  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction

       name: Pixelcopter-PLE-v0
       type: Pixelcopter-PLE-v0
     metrics:
+        - type: mean_reward
+          value: 58.13 +/- 55.17
+          name: mean_reward
+          verified: false
 ---
+# 🚁 Reinforce Agent — Pixelcopter-PLE-v0
+A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).
+---
+## 📊 Performance
+| Metric | Value |
+|--------|-------|
+| Mean Reward | 58.13 |
+| Std of Reward | ±55.17 |
+| Best Average Score | 80.65 (Episode 46000) |
+| Evaluation Episodes | 10 |
+| Training Episodes | 50,000 |
+---
+## 🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)
+REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
+1. Rolling out full episodes using the current policy
+2. Computing discounted returns **Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ...** for each timestep
+3. Updating the policy by maximizing **E[ log π_θ(a|s) · Gₜ ]**
+The policy network is a simple feedforward neural network:
+- **Input:** State observation vector
+- **Hidden layer:** Fully connected + ReLU activation
+- **Output:** Action probabilities via Softmax
+---
+## ⚙️ Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+| Hidden layer size | 64 |
+| Training episodes | 50,000 |
+| Max steps per episode | 10,000 |
+| Discount factor (γ) | 0.99 |
+| Learning rate | 1e-4 |
+| Optimizer | Adam |
+---
+## 🎮 About the Environment
+**Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.
+- **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
+- **Action space:** 2 discrete actions — throttle up or do nothing
+- **Reward:** +1 for each timestep survived
+- **Episode ends:** On collision with a wall or the ground/ceiling
+---
+## 🚀 How to Use
+```python
+from ple.games.pixelcopter import Pixelcopter
+from ple import PLE
+import torch
+# Load the model
+model = torch.load("model.pt", map_location=torch.device("cpu"))
+model.eval()
+# Run inference
+state, _ = env.reset()
+action, _ = model.act(state)
+```
+---
+## 📚 Training Details
+- **Framework:** PyTorch
+- **Returns:** Standardized per episode for training stability
+- **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper
+---
+## 👤 Author
+Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).