nandinitatiwala's picture
Update README.md
44cb572 verified
---
library_name: stable-baselines3
tags:
- PandaReachDense-v3
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: A2C
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: PandaReachDense-v3
type: PandaReachDense-v3
metrics:
- type: mean_reward
value: -0.19 +/- 0.10
name: mean_reward
verified: false
---
# **A2C** Agent playing **PandaReachDense-v3**
This is a trained model of a **A2C** agent playing **PandaReachDense-v3**
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
## Usage (with Stable-baselines3)
Using Unit 6 Deep Reinforcement Learning, this project was aimed at using an Advantage Actor-Critic (A2C) agent to train a robotic arm
to move to the right position. The aim is to get a result of >= -3.5 (the result = mean_reward - std of reward). My result was
Mean reward = -0.14 +/- 0.09 (ie. -0.23 to -0.05). I used the A2C Robotic Arm code as a baseline to improve the learning results. There were
2 main things that I changed: hyperparameters and the implementation of a callback function.
1) Hyperparameters:
Using https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html, I was able to determine the hyperparameters that I wanted to tune:
gamma, gae_lambda, learning_rate and verbose)
I then used a sample code: https://stable-baselines3.readthedocs.io/en/master/_modules/stable_baselines3/a2c/a2c.html#A2C to find the values
of each of these hyperparameters. I kept learning_rate and gamma the same. Changing the verbose just gives more information about the tuning
process (doesn't change much of the training process). As for gae_lambda, setting it slightly lower than 1 allowed for reduced bias and a
faster learning rate (to get a specific number from 0.9 to 0.99, I worked with AI to find the best parameter).
3) Callback Function:
A callback function can greatly improve results especially when training for a longer period of time. I used a callback function that checked
the mean reward every 1000 steps. Using the code provided on this website,
(https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html), I wrote a basic code and worked with AI to modify it for the
intended purpose. Once I gained a more nuanced code, I was able to monitor the training progress throughout and ensure that the model was
improving with each iteration. This allowed me to modify the code and parameters if needed before the training ended to fine-tune a better
model overall.
```python
from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub
...
```