ThomasSimonini HF staff commited on
Commit
4db621e
1 Parent(s): c750bdb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -4,4 +4,45 @@ tags:
4
  - reinforcement-learning
5
  - stable-baselines3
6
  ---
7
- # TODO: Fill this model card
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - reinforcement-learning
5
  - stable-baselines3
6
  ---
7
+ # ppo-LunarLander-v2
8
+
9
+ This is a pre-trained model of a PPO agent playing LunarLander-v2 using the [stable-baselines3](https://github.com/DLR-RM/stable-baselines3) library.
10
+
11
+ ### Usage (with Stable-baselines3)
12
+ Using this model becomes easy when you have stable-baselines3 and huggingface_sb3 installed:
13
+
14
+ ```
15
+ pip install stable-baselines3
16
+ pip install huggingface_sb3
17
+ ```
18
+
19
+ Then, you can use the model like this:
20
+
21
+ ```python
22
+ import gym
23
+
24
+ from huggingface_sb3 import load_from_hub
25
+ from stable_baselines3 import PPO
26
+ from stable_baselines3.common.evaluation import evaluate_policy
27
+
28
+ # Retrieve the model from the hub
29
+ ## repo_id = id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name})
30
+ ## filename = name of the model zip file from the repository
31
+ checkpoint = load_from_hub(repo_id="ThomasSimonini/ppo-LunarLander-v2", filename="LunarLander-v2")
32
+ model = PPO.load(checkpoint)
33
+
34
+ # Evaluate the agent
35
+ eval_env = gym.make('LunarLander-v2')
36
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
37
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
38
+
39
+ # Watch the agent play
40
+ obs = env.reset()
41
+ for i in range(1000):
42
+ action, _state = model.predict(obs)
43
+ obs, reward, done, info = env.step(action)
44
+ env.render()
45
+ if done:
46
+ obs = env.reset()
47
+ env.close()
48
+ ```