daenielkim-66
commited on
Commit
•
0d8b86a
1
Parent(s):
a471d9b
Update README.md
Browse files
README.md
CHANGED
@@ -22,16 +22,85 @@ model-index:
|
|
22 |
---
|
23 |
|
24 |
# **A2C** Agent playing **PandaReachDense-v3**
|
|
|
25 |
This is a trained model of a **A2C** agent playing **PandaReachDense-v3**
|
26 |
-
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
|
27 |
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
|
|
|
|
31 |
|
32 |
-
|
33 |
-
|
34 |
-
|
|
|
|
|
|
|
35 |
|
36 |
-
|
37 |
```
|
|
|
|
22 |
---
|
23 |
|
24 |
# **A2C** Agent playing **PandaReachDense-v3**
|
25 |
+
## General information about the project:
|
26 |
This is a trained model of a **A2C** agent playing **PandaReachDense-v3**
|
27 |
+
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). It controls a robotic arm to pick up balls.
|
28 |
|
29 |
+
### What I did:
|
30 |
+
Manually tuned hyperparameters by adding "learning_rate=0.0007, n_steps=5, gamma=0.99, gae_lambda=0.95" to the A2C model.
|
31 |
+
```
|
32 |
+
model = A2C(policy = "MultiInputPolicy",
|
33 |
+
env = env,
|
34 |
+
learning_rate=0.0007,
|
35 |
+
n_steps=5,
|
36 |
+
gamma=0.99,
|
37 |
+
gae_lambda=0.95,
|
38 |
+
verbose=1)
|
39 |
+
```
|
40 |
+
|
41 |
+
## Links to relevant resources such as tutorials.
|
42 |
+
Reinforcement Learning Tips and Tricks: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
|
43 |
+
|
44 |
+
A Github Training Framework : https://github.com/DLR-RM/rl-baselines3-zoo
|
45 |
+
|
46 |
+
Poe (MrProgrammer Bot):
|
47 |
+
I tried to follow what this was saying but I had a hard time understanding.
|
48 |
+
```
|
49 |
+
import gym
|
50 |
+
from stable_baselines3 import A2C
|
51 |
+
from stable_baselines3.common.envs import DummyVecEnv
|
52 |
+
from stable_baselines3.common.evaluation import evaluate_policy
|
53 |
+
from stable_baselines3.common.callbacks import EvalCallback
|
54 |
+
from stable_baselines3.common.env_checker import check_env
|
55 |
+
from stable_baselines3.common.vec_env import VecNormalize
|
56 |
+
```
|
57 |
+
|
58 |
+
### Next, load and prepare your environment:
|
59 |
+
```
|
60 |
+
env = gym.make('your_environment_name') # Replace with the name of your environment
|
61 |
+
env = DummyVecEnv([lambda: env])
|
62 |
+
env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)
|
63 |
+
```
|
64 |
+
|
65 |
+
### Now, define a function to train and evaluate your A2C agent:
|
66 |
+
```
|
67 |
+
def train_and_evaluate(hyperparameters):
|
68 |
+
model = A2C("MlpPolicy", env, verbose=0, **hyperparameters)
|
69 |
+
|
70 |
+
eval_env = gym.make('your_evaluation_environment_name') # Replace with the name of your evaluation environment
|
71 |
+
eval_env = DummyVecEnv([lambda: eval_env])
|
72 |
+
eval_env = VecNormalize(eval_env, norm_obs=True, norm_reward=False, clip_obs=10.)
|
73 |
+
|
74 |
+
eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/',
|
75 |
+
log_path='./logs/', eval_freq=10000,
|
76 |
+
deterministic=True, render=False)
|
77 |
+
|
78 |
+
model.learn(total_timesteps=int(1e5), callback=eval_callback)
|
79 |
+
|
80 |
+
mean_reward, _ = evaluate_policy(model, eval_env, n_eval_episodes=10)
|
81 |
+
|
82 |
+
return mean_reward
|
83 |
+
```
|
84 |
+
|
85 |
+
### Now, we can define the hyperparameters grid and start the grid search:
|
86 |
+
```
|
87 |
+
hyperparameters_grid = {
|
88 |
+
'gamma': [0.99, 0.95],
|
89 |
+
'learning_rate': [0.001, 0.0001],
|
90 |
+
'ent_coef': [0.01, 0.1],
|
91 |
+
# Add other hyperparameters of interest
|
92 |
+
}
|
93 |
|
94 |
+
best_reward = float('-inf')
|
95 |
+
best_hyperparameters = {}
|
96 |
|
97 |
+
for hyperparameters in hyperparameters_grid:
|
98 |
+
mean_reward = train_and_evaluate(hyperparameters)
|
99 |
+
|
100 |
+
if mean_reward > best_reward:
|
101 |
+
best_reward = mean_reward
|
102 |
+
best_hyperparameters = hyperparameters
|
103 |
|
104 |
+
print("Best hyperparameters:", best_hyperparameters)
|
105 |
```
|
106 |
+
In this grid search, we specify a range of values for each hyperparameter of interest. The train_and_evaluate function trains the A2C agent with the given hyperparameters and evaluates its performance. We then update the best hyperparameters if the current combination achieves a higher reward.
|