🚀 PPO Agent playing for LunarLander-v3

The Github repository contains a trained Proximal Policy Optimization (PPO) agent for the classic control task LunarLander-v3 from Gymnasium.
The model is implemented and trained using the Stable-Baselines3 library.

📊 Performance

Environment: LunarLander-v3
Algorithm: PPO
Mean Reward: 289.24 ± 12.88
Training Steps: 2.5M

🧑‍💻 Training

You can run the training pipeline locally or in Colab.

Run in Colab

Click below to open the training notebook:
👉 Open Notebook in Colab

Run Locally

# Clone the repository
git clone https://github.com/AminVilan/RL-PPO-LunarLander-v3.git
cd RL-PPO-LunarLander-v3

# Open the notebook
jupyter notebook src/ppo_lunarlander_training.ipynb

Using the Trained Model

The trained model is available on the Hugging Face Hub. You can load and run it directly:

import gymnasium as gym
from stable_baselines3 import PPO
from huggingface_sb3 import load_from_hub

# Download and load the model from Hugging Face Hub
repo_id = "AminVilan/ppo-LunarLander-v3"
filename = "v01-ppo-LunarLanderV3.zip"
model = load_from_hub(repo_id, filename)

# Create environment
env = gym.make("LunarLander-v3", render_mode="human")

obs, info = env.reset()
done, truncated = False, False

while not (done or truncated):
    action, _ = model.predict(obs)
    obs, reward, done, truncated, info = env.step(action)
    env.render()

env.close()

📚 References

🙌 If you find this useful, please ⭐ it on Github 🤗

Downloads last month: -

Video Preview

Reinforcement Learning

Paper for AminVilan/ppo-LunarLander-v3

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11

Evaluation results

mean_reward on LunarLander-v3
self-reported

289.24 +/- 12.88