nirmanpatel commited on
Commit
e46b456
ยท
verified ยท
1 Parent(s): e655e86

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -8
README.md CHANGED
@@ -15,13 +15,94 @@ model-index:
15
  name: Pixelcopter-PLE-v0
16
  type: Pixelcopter-PLE-v0
17
  metrics:
18
- - type: mean_reward
19
- value: 67.30 +/- 55.17
20
- name: mean_reward
21
- verified: false
22
  ---
23
 
24
- # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
25
- This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
26
- To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
27
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  name: Pixelcopter-PLE-v0
16
  type: Pixelcopter-PLE-v0
17
  metrics:
18
+ - type: mean_reward
19
+ value: 58.13 +/- 55.17
20
+ name: mean_reward
21
+ verified: false
22
  ---
23
 
24
+ # ๐Ÿš Reinforce Agent โ€” Pixelcopter-PLE-v0
25
+
26
+ A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).
27
+
28
+ ---
29
+
30
+ ## ๐Ÿ“Š Performance
31
+
32
+ | Metric | Value |
33
+ |--------|-------|
34
+ | Mean Reward | 58.13 |
35
+ | Std of Reward | ยฑ55.17 |
36
+ | Best Average Score | 80.65 (Episode 46000) |
37
+ | Evaluation Episodes | 10 |
38
+ | Training Episodes | 50,000 |
39
+
40
+ ---
41
+
42
+ ## ๐Ÿง  Algorithm โ€” REINFORCE (Monte Carlo Policy Gradient)
43
+
44
+ REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
45
+ 1. Rolling out full episodes using the current policy
46
+ 2. Computing discounted returns **Gโ‚œ = rโ‚œโ‚Šโ‚ + ฮณrโ‚œโ‚Šโ‚‚ + ฮณยฒrโ‚œโ‚Šโ‚ƒ + ...** for each timestep
47
+ 3. Updating the policy by maximizing **E[ log ฯ€_ฮธ(a|s) ยท Gโ‚œ ]**
48
+
49
+ The policy network is a simple feedforward neural network:
50
+ - **Input:** State observation vector
51
+ - **Hidden layer:** Fully connected + ReLU activation
52
+ - **Output:** Action probabilities via Softmax
53
+
54
+ ---
55
+
56
+ ## โš™๏ธ Hyperparameters
57
+
58
+ | Parameter | Value |
59
+ |-----------|-------|
60
+ | Hidden layer size | 64 |
61
+ | Training episodes | 50,000 |
62
+ | Max steps per episode | 10,000 |
63
+ | Discount factor (ฮณ) | 0.99 |
64
+ | Learning rate | 1e-4 |
65
+ | Optimizer | Adam |
66
+
67
+ ---
68
+
69
+ ## ๐ŸŽฎ About the Environment
70
+
71
+ **Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.
72
+
73
+ - **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
74
+ - **Action space:** 2 discrete actions โ€” throttle up or do nothing
75
+ - **Reward:** +1 for each timestep survived
76
+ - **Episode ends:** On collision with a wall or the ground/ceiling
77
+
78
+ ---
79
+
80
+ ## ๐Ÿš€ How to Use
81
+
82
+ ```python
83
+ from ple.games.pixelcopter import Pixelcopter
84
+ from ple import PLE
85
+ import torch
86
+
87
+ # Load the model
88
+ model = torch.load("model.pt", map_location=torch.device("cpu"))
89
+ model.eval()
90
+
91
+ # Run inference
92
+ state, _ = env.reset()
93
+ action, _ = model.act(state)
94
+ ```
95
+
96
+ ---
97
+
98
+ ## ๐Ÿ“š Training Details
99
+
100
+ - **Framework:** PyTorch
101
+ - **Returns:** Standardized per episode for training stability
102
+ - **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper
103
+
104
+ ---
105
+
106
+ ## ๐Ÿ‘ค Author
107
+
108
+ Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).