ppo-LunarLander-v2 / README.md
prashanthgowni's picture
Update README.md (#1)
b38b25d
|
raw
history blame
1.87 kB
---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
value: 277.82 +/- 22.28
name: mean_reward
verified: false
language:
- en
---
# **PPO** Agent playing **LunarLander-v2**
This is a trained model of a **PPO** agent playing **LunarLander-v2**
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
## Usage (with Stable-baselines3)
```python
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from huggingface_sb3 import load_from_hub
# Download the model checkpoint
model_checkpoint = load_from_hub("prashanthgowni/ppo-LunarLander-v2", "ppo-LunarLander-v2")
# Create a vectorized environment
env = make_vec_env("LunarLander-v2", n_envs=1)
# Load the model
model = PPO.load(model_checkpoint, env=env)
# Evaluate
print("Evaluating model")
mean_reward, std_reward = evaluate_policy(
model,
env,
n_eval_episodes=30,
deterministic=True,
)
print(f"Mean reward = {mean_reward:.2f} +/- {std_reward}")
# Start a new episode
obs = env.reset()
try:
while True:
action, state = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
env.render()
except KeyboardInterrupt:
pass
```
# Conclusion
The above steps ensure that the traind Agent is downloaded.
You may need to download and install required libraries and packages specific to your operating system to resume training from the providied checkpoint and fine tune the Agent further.