PPO Agent playing LunarLander-v3

This is a trained model of a PPO agent playing LunarLander-v3 using the stable-baselines3 library.

Usage (with Stable-Baselines3)

To test this model on Google Colab or your local machine, you can install the updated dependencies and run the clean script below. This will download the trained agent and evaluate its performance.

1. Install required packages and Run the Agent (Download and Evaluation)

!pip install stable-baselines3[extra] gymnasium[box2d] huggingface_hub

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv, VecVideoRecorder
from huggingface_hub import hf_hub_download
import base64
from IPython import display
import os

# Repository details
REPO_ID = "Srgreen/ppo-LunarLander-v3"
FILENAME = "ppo-LunarLander-v3.zip"

print("Downloading the trained model from the Hugging Face Hub...")
checkpoint_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# 1. Create a vectorized environment (required for the video recorder)
video_folder = "./videos"
env = DummyVecEnv([lambda: Monitor(gym.make("LunarLander-v3", render_mode="rgb_array"))])

# 2. Wrap the environment to record the video of the simulation
env = VecVideoRecorder(
    env, 
    video_folder,
    record_video_trigger=lambda x: x == 0, # Records the episode
    video_length=1000,
    name_prefix="lunar-lander-eval"
)

# 3. Load the trained PPO model
print("Loading model weights into the PPO agent...")
model = PPO.load(checkpoint_path, env=env)

# 4. Evaluate the agent over 10 episodes to get the official metrics
print("Evaluating the agent over 10 episodes...")
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10, deterministic=True)

print("-" * 40)
print(f"Mean Reward: {mean_reward:.2f} +/- {std_reward:.2f}")
print("-" * 40)
if mean_reward >= 200:
    print("Result: Successful pilot! Perfect landing on the Moon surface. 🌛🥳")
else:
    print("Result: The agent could use more training steps.")
print("-" * 40)

# 5. Run one more episode just to show the recorded landing video
print("Preparing the video playback...")
obs = env.reset()
done = False
while not done:
    action, _states = model.predict(obs, deterministic=True)
    obs, rewards, dones, infos = env.step(action)
    done = dones[0]

# Close the environment to save the video file properly
env.close()

# 6. Helper function to render the recorded MP4 video inside Google Colab
def show_video(directory):
    html = []
    for filename in os.listdir(directory):
        if filename.endswith(".mp4"):
            video_path = os.path.join(directory, filename)
            video_b64 = base64.b64encode(open(video_path, 'rb').read()).decode('ascii')
            html.append(f'''
                <video controls width="600" autoplay loop muted>
                    <source src="data:video/mp4;base64,{video_b64}" type="video/mp4" />
                </video>
            ''')
    return "".join(html)

# Display the video in the notebook
print("Here is your agent's landing simulation:")
display.display(display.HTML(show_video(video_folder)))

Downloads last month: 198

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v3
self-reported

244.59 +/- 36.50