kalmufti commited on
Commit
1cad660
1 Parent(s): 7f7658d

Update README.md

Browse files

Add install, and usage instructions.

Files changed (1) hide show
  1. README.md +45 -4
README.md CHANGED
@@ -20,9 +20,50 @@ model-index:
20
  type: LunarLander-v2
21
  ---
22
 
23
- # **PPO** Agent playing **LunarLander-v2**
24
  This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
25
 
26
- ## Usage (with Stable-baselines3)
27
- TODO: Add your code
28
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  type: LunarLander-v2
21
  ---
22
 
23
+ # **PPO** Agent Playing **LunarLander-v2**
24
  This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
25
 
26
+ ## Usage (with Stable-baselines3, and huggingface_sb3)
27
+ To use this model make sure you are running Python version 3.7.13. You can use [pyenv](https://github.com/pyenv/pyenv) to manage multiple versions of Python on your system.
28
+
29
+ ### Install required packages:
30
+ ```bash
31
+ pip install stable-baselines3
32
+ pip install huggingface_sb3
33
+ pip install pickle5
34
+ pip install Box2D
35
+ pip install pyglet
36
+ ```
37
+
38
+ You can use this simple script as a base to evaluate and run the model:
39
+ ```python
40
+ import gym
41
+ from stable_baselines3 import PPO
42
+ from huggingface_sb3 import load_from_hub
43
+ from stable_baselines3.common.evaluation import evaluate_policy
44
+
45
+ # Download the model from the huggingface hub
46
+ checkpoint = load_from_hub(
47
+ repo_id="kalmufti/PPO-LunarLander-v2",
48
+ filename="ppo-LunarLander-v2.zip",
49
+ )
50
+ # Load the policy
51
+ model = PPO.load(checkpoint)
52
+ # Create an environment
53
+ env = gym.make("LunarLander-v2")
54
+ # Optional - evaluate the agent means
55
+ mean_reward, std_reward = evaluate_policy(
56
+ model, env, render=False, n_eval_episodes=5, deterministic=True, warn=False
57
+ )
58
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
59
+
60
+ # Watch the agent playing the environment
61
+ obs = env.reset()
62
+ for i in range(1000):
63
+ action, _state = model.predict(obs)
64
+ obs, reward, done, info = env.step(action)
65
+ env.render()
66
+ if done:
67
+ obs = env.reset()
68
+ env.close()
69
+ ```