prashanthgowni
/

ppo-LunarLander-v2

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Model card Files Files and versions Community

ppo-LunarLander-v2 / README.md

prashanthgowni's picture

Update README.md (#1)

b38b25d 11 months ago

|

raw history blame contribute delete

No virus

1.87 kB

	---
	library_name: stable-baselines3
	tags:
	- LunarLander-v2
	- deep-reinforcement-learning
	- reinforcement-learning
	- stable-baselines3
	model-index:
	- name: PPO
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: LunarLander-v2
	type: LunarLander-v2
	metrics:
	- type: mean_reward
	value: 277.82 +/- 22.28
	name: mean_reward
	verified: false
	language:
	- en
	---

	# PPO Agent playing LunarLander-v2
	This is a trained model of a PPO agent playing LunarLander-v2
	using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).

	## Usage (with Stable-baselines3)

	```python
	from stable_baselines3 import PPO
	from stable_baselines3.common.env_util import make_vec_env
	from stable_baselines3.common.evaluation import evaluate_policy

	from huggingface_sb3 import load_from_hub


	# Download the model checkpoint
	model_checkpoint = load_from_hub("prashanthgowni/ppo-LunarLander-v2", "ppo-LunarLander-v2")
	# Create a vectorized environment
	env = make_vec_env("LunarLander-v2", n_envs=1)

	# Load the model
	model = PPO.load(model_checkpoint, env=env)

	# Evaluate
	print("Evaluating model")
	mean_reward, std_reward = evaluate_policy(
	model,
	env,
	n_eval_episodes=30,
	deterministic=True,
	)
	print(f"Mean reward = {mean_reward:.2f} +/- {std_reward}")

	# Start a new episode
	obs = env.reset()

	try:
	while True:
	action, state = model.predict(obs, deterministic=True)
	obs, reward, done, info = env.step(action)
	env.render()

	except KeyboardInterrupt:
	pass

	```
	# Conclusion
	The above steps ensure that the traind Agent is downloaded.
	You may need to download and install required libraries and packages specific to your operating system to resume training from the providied checkpoint and fine tune the Agent further.