|
--- |
|
library_name: stable-baselines3 |
|
tags: |
|
- PandaReachDense-v3 |
|
- deep-reinforcement-learning |
|
- reinforcement-learning |
|
- stable-baselines3 |
|
model-index: |
|
- name: A2C |
|
results: |
|
- task: |
|
type: reinforcement-learning |
|
name: reinforcement-learning |
|
dataset: |
|
name: PandaReachDense-v3 |
|
type: PandaReachDense-v3 |
|
metrics: |
|
- type: mean_reward |
|
value: -0.19 +/- 0.10 |
|
name: mean_reward |
|
verified: false |
|
--- |
|
|
|
# **A2C** Agent playing **PandaReachDense-v3** |
|
This is a trained model of a **A2C** agent playing **PandaReachDense-v3** |
|
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). |
|
|
|
## Usage (with Stable-baselines3) |
|
Using Unit 6 Deep Reinforcement Learning, this project was aimed at using an Advantage Actor-Critic (A2C) agent to train a robotic arm |
|
to move to the right position. The aim is to get a result of >= -3.5 (the result = mean_reward - std of reward). My result was |
|
Mean reward = -0.14 +/- 0.09 (ie. -0.23 to -0.05). I used the A2C Robotic Arm code as a baseline to improve the learning results. There were |
|
2 main things that I changed: hyperparameters and the implementation of a callback function. |
|
|
|
1) Hyperparameters: |
|
Using https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html, I was able to determine the hyperparameters that I wanted to tune: |
|
gamma, gae_lambda, learning_rate and verbose) |
|
I then used a sample code: https://stable-baselines3.readthedocs.io/en/master/_modules/stable_baselines3/a2c/a2c.html#A2C to find the values |
|
of each of these hyperparameters. I kept learning_rate and gamma the same. Changing the verbose just gives more information about the tuning |
|
process (doesn't change much of the training process). As for gae_lambda, setting it slightly lower than 1 allowed for reduced bias and a |
|
faster learning rate (to get a specific number from 0.9 to 0.99, I worked with AI to find the best parameter). |
|
|
|
3) Callback Function: |
|
A callback function can greatly improve results especially when training for a longer period of time. I used a callback function that checked |
|
the mean reward every 1000 steps. Using the code provided on this website, |
|
(https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html), I wrote a basic code and worked with AI to modify it for the |
|
intended purpose. Once I gained a more nuanced code, I was able to monitor the training progress throughout and ensure that the model was |
|
improving with each iteration. This allowed me to modify the code and parameters if needed before the training ended to fine-tune a better |
|
model overall. |
|
|
|
|
|
```python |
|
from stable_baselines3 import ... |
|
from huggingface_sb3 import load_from_hub |
|
|
|
... |
|
``` |
|
|