Lew commited on
Commit
07f4999
1 Parent(s): 8f77724

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +1 -22
  2. hyperparameters.json +1 -1
  3. model.pt +1 -1
  4. replay.mp4 +0 -0
  5. results.json +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ model-index:
16
  type: Pixelcopter-PLE-v0
17
  metrics:
18
  - type: mean_reward
19
- value: 50.10 +/- 29.76
20
  name: mean_reward
21
  verified: false
22
  ---
@@ -24,25 +24,4 @@ model-index:
24
  # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
25
  This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
26
  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
27
-
28
- Some math about 'Pixelcopter' training.
29
- The game is to fly in a passage and avoid blocks. Let we have trained our agent so that the probability to crash at block is _p_ (low enogh, I hope).
30
- The probability that the copter crashes exactly at _n_-th block is product of probabilities it doesn't crash at previous _(n-1)_ blocks and probability it crashes at current block:
31
- $$P = {p \cdot (1-p)^{n-1}}$$
32
- The mathematical expectation of number of the block it crashes at is:
33
- $$<n> = \sum_{n=1}^\infty{n \cdot p \cdot (1-p)^{n-1}} = \frac{1}{p}$$
34
- The std is:
35
- $$std(n) = \sqrt{<n^2>-<n>^2}$$
36
- $$<n^2> = \sum_{n=1}^\infty{n^2 \cdot p \cdot (1-p)^{n-1}} = \frac{2-p}{p^2}$$
37
- $$std(n) = \sqrt{\frac{2-p}{p^2}-\left( \frac{1}{p} \right) ^2} = \frac{\sqrt{1-p}}{p}$$
38
- So difference is:
39
- $$<n> - std(n) = \frac{1 - \sqrt{1-p}}{p}$$
40
- As long as
41
- $$ 0 \le p \le 1 $$
42
- the following is true:
43
- $$\sqrt{1-p} \ge 1-p$$
44
- $$<n> - std(n) = \frac{1 - \sqrt{1-p}}{p} \le \frac{1 - (1-p)}{p} = 1$$
45
- The scores _s_ in 'Pixelcopter' are the number of blocks passed decreased by 5 (for crash). So the average is lower by 5 and the std is the same. No matter how small _p_ is, our 'least score' is:
46
- $$ (<n> - 5) - std(n) = <n> - std(n) - 5 \le - 4$$
47
- But as we use only 10 episodes to calculate statistics and episode duration is limited, we can still achieve the goal, better agent, more chances. But understanding this is disappointing
48
 
 
16
  type: Pixelcopter-PLE-v0
17
  metrics:
18
  - type: mean_reward
19
+ value: 70.30 +/- 33.94
20
  name: mean_reward
21
  verified: false
22
  ---
 
24
  # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
25
  This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
26
  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
hyperparameters.json CHANGED
@@ -1 +1 @@
1
- {"h_size": 64, "activation": "ReLU", "num_layers": 3, "scale": 1.0, "n_training_episodes": 50000, "n_evaluation_episodes": 10, "max_t": 500, "gamma": 0.9, "batch_size": 120, "lr": 0.0001, "final_lr": 1e-05, "env_id": "Pixelcopter-PLE-v0", "state_space": 7, "action_space": 2, "k": 3, "beta": 0.8}
 
1
+ {"h_size": 64, "activation": "ReLU", "num_layers": 3, "scale": 1.0, "n_training_episodes": 50000, "n_evaluation_episodes": 10, "max_t": 600, "gamma": 0.9, "batch_size": 1000, "lr": 1e-05, "final_lr": 1e-06, "env_id": "Pixelcopter-PLE-v0", "state_space": 7, "action_space": 2, "k": 3, "beta": 0.8}
model.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:96948ebb96821db5d57c77440f04e4ca598954dd84e8c8ea62cf42282ffe3071
3
  size 30882
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f30b3e0c6aaa7a24a7db381d23c3d4d4885739434c88f11ec60cba143388dcf
3
  size 30882
replay.mp4 CHANGED
Binary files a/replay.mp4 and b/replay.mp4 differ
 
results.json CHANGED
@@ -1 +1 @@
1
- {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 50.1, "n_evaluation_episodes": 10, "eval_datetime": "2023-12-06T18:47:50.368651"}
 
1
+ {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 70.3, "n_evaluation_episodes": 10, "eval_datetime": "2023-12-07T18:01:03.507633"}