File size: 5,207 Bytes
e8df7ab 026cee9 e8df7ab b00d034 e8df7ab b00d034 e8df7ab 4531624 e8df7ab d6e7bb2 e8df7ab d6e7bb2 b00d034 d6e7bb2 e8df7ab d6e7bb2 b00d034 4531624 b00d034 4531624 b00d034 d6e7bb2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
value: 298.48 +/- 12.14
name: mean_reward
verified: false
---
# Train your first Deep Reinforcement Learning Agent 🤖
![Cover](https://github.com/huggingface/deep-rl-class/blob/main/unit1/assets/img/thumbnail.png?raw=true)
This is a trained model of a **PPO** agent playing **LunarLander-v2**
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
## 1. Install package
```python
from IPython.display import clear_output
!apt install swig cmake
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
!sudo apt-get update
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay
clear_output()
```
Restart notebook
```
import os
os.kill(os.getpid(), 9)
```
## 2. Use model
```python
from huggingface_sb3 import load_from_hub
repo_id = "thien1892/LunarLander-v2-ppo-5m"
filename = "ppo-LunarLander-v2-5m.zip" # The model filename.zip
# When the model was trained on Python 3.8 the pickle protocol is 5
# But Python 3.6, 3.7 use protocol 4
# In order to get compatibility we need to:
# 1. Install pickle5 (we done it at the beginning of the colab)
# 2. Create a custom empty object we pass as parameter to PPO.load()
custom_objects = {
"learning_rate": 0.0,
"lr_schedule": lambda _: 0.0,
"clip_range": lambda _: 0.0,
}
checkpoint = load_from_hub(repo_id, filename)
model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)
```
Evaluate
```python
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
```
## 3. Re-train model (Optional 1)
```python
# Load a saved LunarLander model from the Hub and retrain
import gym
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import DummyVecEnv
repo_id = "thien1892/LunarLander-v2-ppo-v5"
filename = "ppo-LunarLander-v2.zip" # The model filename.zip
checkpoint = load_from_hub(repo_id, filename)
myenv = make_vec_env('LunarLander-v2', n_envs=16)
custom_objects = {
"learning_rate": 1e-5,
"clip_range": lambda _: 0.15,
}
model = PPO.load(checkpoint, reset_num_timesteps=True, print_system_info=True,custom_objects = custom_objects, env = myenv)
# Train it for 1,000,000 timesteps
model.learn(total_timesteps=1000000)
# Save the model
model_name = "ppo-LunarLander-v2-5m"
model.save(model_name)
# Evaluate
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
```
Pust to HF hub
```python
notebook_login()
!git config --global credential.helper store
```
```
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
repo_id = "thien1892/LunarLander-v2-ppo-5m"
# TODO: Define the name of the environment
env_id = "LunarLander-v2"
# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])
# TODO: Define the model architecture we used
model_architecture = "PPO"
## TODO: Define the commit message
commit_message = "Upload PPO LunarLander-v2 trained agent"
# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub
package_to_hub(model=model, # Our trained model
model_name=model_name, # The name of our trained model
model_architecture=model_architecture, # The model architecture we used: in our case PPO
env_id=env_id, # Name of the environment
eval_env=eval_env, # Evaluation Environment
repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
commit_message=commit_message)
```
## 4. Re-train model (Optional 2)
- Change `--repo_id` become your repo id :)
- `--id_retrain` and `--filename_retrain` in order to load my trained model, you can change to your trained model
```python
!python train_and_push.py --repo_id "thien1892/LunarLander-v2-ppo-v3" \
--commit_message "retrain model from hub 5m" \
--id_retrain "thien1892/LunarLander-v2-ppo-v5" \
--filename_retrain "ppo-LunarLander-v2.zip" \
--total_timesteps 2000000 \
--learning_rate 3e-5 \
--n_envs 64
``` |