nandinitatiwala
/

a2c-PandaReachDense-v3

Reinforcement Learning

stable-baselines3

PandaReachDense-v3

deep-reinforcement-learning

Model card Files Files and versions Community

a2c-PandaReachDense-v3 / README.md

nandinitatiwala's picture

nandinitatiwala

Update README.md

44cb572 verified 3 months ago

|

history blame contribute delete

No virus

2.73 kB

	---
	library_name: stable-baselines3
	tags:
	- PandaReachDense-v3
	- deep-reinforcement-learning
	- reinforcement-learning
	- stable-baselines3
	model-index:
	- name: A2C
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: PandaReachDense-v3
	type: PandaReachDense-v3
	metrics:
	- type: mean_reward
	value: -0.19 +/- 0.10
	name: mean_reward
	verified: false
	---

	# A2C Agent playing PandaReachDense-v3
	This is a trained model of a A2C agent playing PandaReachDense-v3
	using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).

	## Usage (with Stable-baselines3)
	Using Unit 6 Deep Reinforcement Learning, this project was aimed at using an Advantage Actor-Critic (A2C) agent to train a robotic arm
	to move to the right position. The aim is to get a result of >= -3.5 (the result = mean_reward - std of reward). My result was
	Mean reward = -0.14 +/- 0.09 (ie. -0.23 to -0.05). I used the A2C Robotic Arm code as a baseline to improve the learning results. There were
	2 main things that I changed: hyperparameters and the implementation of a callback function.

	1) Hyperparameters:
	Using https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html, I was able to determine the hyperparameters that I wanted to tune:
	gamma, gae_lambda, learning_rate and verbose)
	I then used a sample code: https://stable-baselines3.readthedocs.io/en/master/_modules/stable_baselines3/a2c/a2c.html#A2C to find the values
	of each of these hyperparameters. I kept learning_rate and gamma the same. Changing the verbose just gives more information about the tuning
	process (doesn't change much of the training process). As for gae_lambda, setting it slightly lower than 1 allowed for reduced bias and a
	faster learning rate (to get a specific number from 0.9 to 0.99, I worked with AI to find the best parameter).

	3) Callback Function:
	A callback function can greatly improve results especially when training for a longer period of time. I used a callback function that checked
	the mean reward every 1000 steps. Using the code provided on this website,
	(https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html), I wrote a basic code and worked with AI to modify it for the
	intended purpose. Once I gained a more nuanced code, I was able to monitor the training progress throughout and ensure that the model was
	improving with each iteration. This allowed me to modify the code and parameters if needed before the training ended to fine-tune a better
	model overall.


	```python
	from stable_baselines3 import ...
	from huggingface_sb3 import load_from_hub

	...
	```