Saraaaaaaaaa
/

Reinforce-Unit4-1

Reinforcement Learning

custom-implementation

Model card Files Files and versions Community

Reinforce-Unit4-1 / README.md

Saraaaaaaaaa's picture

Update README.md

8e876fd verified 7 months ago

|

history blame contribute delete

2.08 kB

	---
	tags:
	- CartPole-v1
	- reinforce
	- reinforcement-learning
	- custom-implementation
	- deep-rl-class
	model-index:
	- name: Reinforce-Unit4-1
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: CartPole-v1
	type: CartPole-v1
	metrics:
	- type: mean_reward
	value: 95.00 +/- 14.54
	name: mean_reward
	verified: false
	---

	# Reinforce Agent playing CartPole-v1
	This is a trained model of a Reinforce agent playing CartPole-v1.
	To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction


	# *Project Information*


	Policy-based learning is directly approximating π without having to learn a value function- Our objective then is to maximize the performance of the parameterized policy using gradient ascent.
	TL;DR: Having the cart learn to balance the pole via optimizing π for the best output; the pole not falling over.
	This method of learning skips over using a value function like Q-learning does, allowing an immediate improvement in the next iteration instead of having to calculate and approximate tables and numbers for a new action, as Q-learning does.


	This specific CartPole model only has 500 training timesteps- the average is 1000, which is the reason why the cart struggles so much with balancing the pole in the video; it has not trained enough for it.
	A model trained with 1000 timesteps is successful in balancing the pole, and the more training steps a model has, the more accurate its result is, like when you play a really hard level in a video game over and over, it eventually gets easier.
	However, the more timesteps a model has, the longer it takes to train and render- 1000 timesteps take 10-15 minutes to load, and the time only increases the more training timesteps are inputted.

	Here -https...- is a video of it working with 1000 timesteps, and here -https...- is one with 2000 (links will be inserted soon)