Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +1 -22
hyperparameters.json +1 -1
model.pt +1 -1
replay.mp4 +0 -0
results.json +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ model-index:
       type: Pixelcopter-PLE-v0
     metrics:
     - type: mean_reward
-      value: 50.10 +/- 29.76
       name: mean_reward
       verified: false
 ---
@@ -24,25 +24,4 @@ model-index:
   # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
   This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
   To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
-  Some math about 'Pixelcopter' training.
-The game is to fly in a passage and avoid blocks. Let we have trained our agent so that the probability to crash at block is _p_ (low enogh, I hope).
-The probability that the copter crashes exactly at _n_-th block is product of probabilities it doesn't crash at previous _(n-1)_ blocks and probability it crashes at current block:
-$$P = {p \cdot (1-p)^{n-1}}$$
-The mathematical expectation of number of  the block it crashes at is:
-$$<n> = \sum_{n=1}^\infty{n \cdot p \cdot (1-p)^{n-1}} = \frac{1}{p}$$
-The std is:
-$$std(n) = \sqrt{<n^2>-<n>^2}$$
-$$<n^2> = \sum_{n=1}^\infty{n^2 \cdot p \cdot (1-p)^{n-1}} = \frac{2-p}{p^2}$$
-$$std(n) = \sqrt{\frac{2-p}{p^2}-\left( \frac{1}{p} \right) ^2} = \frac{\sqrt{1-p}}{p}$$
-So difference is:
-$$<n> - std(n) = \frac{1 - \sqrt{1-p}}{p}$$
-As long as
-$$ 0 \le p \le 1 $$
-the following is true:
-$$\sqrt{1-p} \ge 1-p$$
-$$<n> - std(n) = \frac{1 - \sqrt{1-p}}{p} \le \frac{1 - (1-p)}{p} = 1$$
-The scores _s_ in 'Pixelcopter' are the number of blocks passed decreased by 5 (for crash). So the average is lower by 5 and the std is the same. No matter how small _p_ is, our 'least score' is:
-$$ (<n> - 5) - std(n) = <n> - std(n) - 5 \le - 4$$
-But as we use only 10 episodes to calculate statistics and episode duration is limited, we can still achieve the goal, better agent, more chances. But understanding this is disappointing

       type: Pixelcopter-PLE-v0
     metrics:
     - type: mean_reward
+      value: 70.30 +/- 33.94
       name: mean_reward
       verified: false
 ---
   # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
   This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
   To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction

hyperparameters.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"h_size": 64, "activation": "ReLU", "num_layers": 3, "scale": 1.0, "n_training_episodes": 50000, "n_evaluation_episodes": 10, "max_t": ~~500~~, "gamma": 0.9, "batch_size": ~~120~~, "lr": ~~0.0001~~, "final_lr": 1e-05, "env_id": "Pixelcopter-PLE-v0", "state_space": 7, "action_space": 2, "k": 3, "beta": 0.8}


1	+ {"h_size": 64, "activation": "ReLU", "num_layers": 3, "scale": 1.0, "n_training_episodes": 50000, "n_evaluation_episodes": 10, "max_t": 600, "gamma": 0.9, "batch_size": 1000, "lr": 1e-05, "final_lr": 1e-06, "env_id": "Pixelcopter-PLE-v0", "state_space": 7, "action_space": 2, "k": 3, "beta": 0.8}

model.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:96948ebb96821db5d57c77440f04e4ca598954dd84e8c8ea62cf42282ffe3071
 size 30882

 version https://git-lfs.github.com/spec/v1
+oid sha256:6f30b3e0c6aaa7a24a7db381d23c3d4d4885739434c88f11ec60cba143388dcf
 size 30882

replay.mp4 CHANGED Viewed

Binary files a/replay.mp4 and b/replay.mp4 differ

results.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 50.1, "n_evaluation_episodes": 10, "eval_datetime": "2023-12-~~06T18~~:47:50.~~368651~~"}


1	+ {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 70.3, "n_evaluation_episodes": 10, "eval_datetime": "2023-12-07T18:01:03.507633"}