A2C Agent playing PandaReachDense-v3

This is a trained model of a A2C agent playing PandaReachDense-v3 using the stable-baselines3 library.

Usage (with Stable-baselines3)

Using Unit 6 Deep Reinforcement Learning, this project was aimed at using an Advantage Actor-Critic (A2C) agent to train a robotic arm to move to the right position. The aim is to get a result of >= -3.5 (the result = mean_reward - std of reward). My result was Mean reward = -0.14 +/- 0.09 (ie. -0.23 to -0.05). I used the A2C Robotic Arm code as a baseline to improve the learning results. There were 2 main things that I changed: hyperparameters and the implementation of a callback function.

Hyperparameters: Using https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html, I was able to determine the hyperparameters that I wanted to tune: gamma, gae_lambda, learning_rate and verbose) I then used a sample code: https://stable-baselines3.readthedocs.io/en/master/_modules/stable_baselines3/a2c/a2c.html#A2C to find the values of each of these hyperparameters. I kept learning_rate and gamma the same. Changing the verbose just gives more information about the tuning process (doesn't change much of the training process). As for gae_lambda, setting it slightly lower than 1 allowed for reduced bias and a faster learning rate (to get a specific number from 0.9 to 0.99, I worked with AI to find the best parameter).
Callback Function: A callback function can greatly improve results especially when training for a longer period of time. I used a callback function that checked the mean reward every 1000 steps. Using the code provided on this website, (https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html), I wrote a basic code and worked with AI to modify it for the intended purpose. Once I gained a more nuanced code, I was able to monitor the training progress throughout and ensure that the model was improving with each iteration. This allowed me to modify the code and parameters if needed before the training ended to fine-tune a better model overall.

from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub

...

nandinitatiwala
/

a2c-PandaReachDense-v3

A2C Agent playing PandaReachDense-v3

Usage (with Stable-baselines3)

Evaluation results