File size: 2,084 Bytes
69aff6f e286ab8 69aff6f 6430312 8e876fd 6430312 e286ab8 8e876fd 6430312 e286ab8 8e876fd 6430312 69aff6f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
tags:
- CartPole-v1
- reinforce
- reinforcement-learning
- custom-implementation
- deep-rl-class
model-index:
- name: Reinforce-Unit4-1
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: CartPole-v1
type: CartPole-v1
metrics:
- type: mean_reward
value: 95.00 +/- 14.54
name: mean_reward
verified: false
---
# **Reinforce** Agent playing **CartPole-v1**
This is a trained model of a **Reinforce** agent playing **CartPole-v1**.
To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
# ***Project Information***
**Policy-based learning** is directly approximating π without having to learn a value function- Our objective then is to maximize the performance of the parameterized policy using gradient ascent.
TL;DR: Having the cart learn to balance the pole via optimizing π for the best output; *the pole not falling over*.
This method of learning skips over using a value function like Q-learning does, allowing an immediate improvement in the next iteration instead of having to calculate and approximate tables and numbers for a new action, as Q-learning does.
This specific CartPole model only has 500 training timesteps- the average is 1000, which is the reason why the cart struggles so much with balancing the pole in the video; it has not trained enough for it.
A model trained with 1000 timesteps is successful in balancing the pole, and the more training steps a model has, the more accurate its result is, like when you play a really hard level in a video game over and over, it eventually gets easier.
However, the more timesteps a model has, the longer it takes to train and render- 1000 timesteps take 10-15 minutes to load, and the time only increases the more training timesteps are inputted.
Here -https...- is a video of it working with 1000 timesteps, and here -https...- is one with 2000 *(links will be inserted soon)*
|