Saraaaaaaaaa
commited on
Commit
•
6430312
1
Parent(s):
69aff6f
Update README.md
Browse files
README.md
CHANGED
@@ -24,4 +24,10 @@ model-index:
|
|
24 |
# **Reinforce** Agent playing **CartPole-v1**
|
25 |
This is a trained model of a **Reinforce** agent playing **CartPole-v1** .
|
26 |
To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
|
|
24 |
# **Reinforce** Agent playing **CartPole-v1**
|
25 |
This is a trained model of a **Reinforce** agent playing **CartPole-v1** .
|
26 |
To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
|
27 |
+
|
28 |
+
**Policy-based learning** is directly approximating π without having to learn a value function- Our objective then is to maximize the performance of the parameterized policy using gradient ascent.
|
29 |
+
TL;DR: Having the cart learn to balance the pole via optimizing π for the best output; *the pole not falling over*
|
30 |
+
|
31 |
+
This model only had 500 training timesteps- the average is 1000, which is the reason why the cart struggles so much with balancing the pole in the video; it has not trained enough for it.
|
32 |
+
|
33 |
|