Saraaaaaaaaa commited on
Commit
6430312
1 Parent(s): 69aff6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -24,4 +24,10 @@ model-index:
24
  # **Reinforce** Agent playing **CartPole-v1**
25
  This is a trained model of a **Reinforce** agent playing **CartPole-v1** .
26
  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
 
 
 
 
 
 
27
 
 
24
  # **Reinforce** Agent playing **CartPole-v1**
25
  This is a trained model of a **Reinforce** agent playing **CartPole-v1** .
26
  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
27
+
28
+ **Policy-based learning** is directly approximating π without having to learn a value function- Our objective then is to maximize the performance of the parameterized policy using gradient ascent.
29
+ TL;DR: Having the cart learn to balance the pole via optimizing π for the best output; *the pole not falling over*
30
+
31
+ This model only had 500 training timesteps- the average is 1000, which is the reason why the cart struggles so much with balancing the pole in the video; it has not trained enough for it.
32
+
33