LunarLander-v2 / README.md
Hamze-Hammami's picture
Update README.md
228ff3f verified
---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
value: 261.85 +/- 46.42
name: mean_reward
verified: false
---
## My First RL Project
# **PPO** Agent playing **LunarLander-v2**
This is a trained model of a **PPO** agent playing **LunarLander-v2**
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
## Usage
code was done with gym env and stable-basline3 libraray
```python
#Dependencies and stuff
!apt install swig cmake
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
!sudo apt-get update
!apt install python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay
# restart colab
import os
os.kill(os.getpid(), 9)
#display
from pyvirtualdisplay import Display
virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()
# import libraries
import gymnasium as gym
from huggingface_sb3 import load_from_hub, package_to_hub
from huggingface_hub import (
notebook_login,
)
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor
# Create environment
env = gym.make('LunarLander-v2')
#Define PPO
model = PPO(
policy="MlpPolicy",
env=env,
n_steps=1024,
batch_size=64,
n_epochs=4,
gamma=0.999,
gae_lambda=0.98,
ent_coef=0.01,
verbose=1,
)
# Train the agent
model.learn(total_timesteps=1000000)
# Save the model
model_name = "ppo-LunarLander-v2"
model.save(model_name)
#evaluate model
eval_env = Monitor(gym.make("LunarLander-v2"))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
# create a video (for colab)
import gymnasium as gym
from stable_baselines3 import PPO
from IPython.display import Video, display
import os
env = gym.make('LunarLander-v2')
model_name = "ppo-LunarLander-v2"
model = PPO.load(model_name)
def record_video(env, model, video_length=500, prefix="ppo-lunarlander"):
env = gym.wrappers.RecordVideo(env, video_folder=prefix, episode_trigger=lambda x: x == 0)
obs = env.reset()
for _ in range(video_length):
action, _ = model.predict(obs)
obs, _, done, _ = env.step(action)
if done:
obs = env.reset()
env.close()
record_video(env, model, video_length=500, prefix="ppo-lunarlander")
video_path = "ppo-lunarlander/rl-video-episode-0.mp4"
display(Video(video_path))
...
```