--- library_name: stable-baselines3 tags: - LunarLander-v2 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: PPO results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: LunarLander-v2 type: LunarLander-v2 metrics: - type: mean_reward value: 261.85 +/- 46.42 name: mean_reward verified: false --- # **PPO** Agent playing **LunarLander-v2** This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). ## Usage code was done with gym env and stable-basline3 libraray ```python !apt install swig cmake !pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt !sudo apt-get update !apt install python3-opengl !apt install ffmpeg !apt install xvfb !pip3 install pyvirtualdisplay # restart colab import os os.kill(os.getpid(), 9) #display from pyvirtualdisplay import Display virtual_display = Display(visible=0, size=(1400, 900)) virtual_display.start() # import libraries import gymnasium from huggingface_sb3 import load_from_hub, package_to_hub from huggingface_hub import ( notebook_login, ) from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env from stable_baselines3.common.evaluation import evaluate_policy from stable_baselines3.common.monitor import Monitor # Create environment env = gym.make('LunarLander-v2') model = PPO( policy="MlpPolicy", env=env, n_steps=1024, batch_size=64, n_epochs=4, gamma=0.999, gae_lambda=0.98, ent_coef=0.01, verbose=1, ) # Train the agent model.learn(total_timesteps=1000000) # Save the model model_name = "ppo-LunarLander-v2" model.save(model_name) #evaluate model eval_env = Monitor(gym.make("LunarLander-v2")) mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True) print(f"mean_reward={mean_reward:.2f} +/- {std_reward}") # create a video (for colab) import gym from stable_baselines3 import PPO from IPython.display import Video, display import os env = gym.make('LunarLander-v2') model_name = "ppo-LunarLander-v2" model = PPO.load(model_name) def record_video(env, model, video_length=500, prefix="ppo-lunarlander"): env = gym.wrappers.RecordVideo(env, video_folder=prefix, episode_trigger=lambda x: x == 0) obs = env.reset() for _ in range(video_length): action, _ = model.predict(obs) obs, _, done, _ = env.step(action) if done: obs = env.reset() env.close() record_video(env, model, video_length=500, prefix="ppo-lunarlander") video_path = "ppo-lunarlander/rl-video-episode-0.mp4" display(Video(video_path)) ... ```