Parth673 commited on
Commit
cd7b548
1 Parent(s): 897a059
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -1,3 +1,52 @@
1
  ---
2
- license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: stable-baselines3
3
+ tags:
4
+ - LunarLander-v2
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - stable-baselines3
8
+ model-index:
9
+ - name: PPO
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: LunarLander-v2
16
+ type: LunarLander-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 263.46 +/- 13.81
20
+ name: mean_reward
21
+ verified: false
22
  ---
23
+
24
+ # **PPO** Agent playing **LunarLander-v2**
25
+ This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
27
+
28
+ ## Usage (with Stable-baselines3)
29
+ First DL agent. Feel free to use for whatever lunar landings are required.
30
+
31
+
32
+ ```python
33
+ # To load it and watch it land (on your computer NOT collab! You have to ditch render-mode="human" to run it in a notebook without visuals)
34
+ import gym
35
+
36
+ from huggingface_sb3 import load_from_hub
37
+ from stable_baselines3 import PPO
38
+ from stable_baselines3.common.evaluation import evaluate_policy
39
+
40
+ # Retrieve the model from the hub
41
+ ## repo_id = id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name})
42
+ ## filename = name of the model zip file from the repository
43
+ checkpoint = load_from_hub(repo_id="MattStammers/ppo-LunarLander-v2", filename="ppo-LunarLander-v2.zip")
44
+ model = PPO.load(checkpoint)
45
+
46
+ # Evaluate the agent and watch it land!
47
+ eval_env = gym.make('LunarLander-v2', render_mode="human")
48
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
49
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
50
+
51
+ ...
52
+ ```