yogeshkulkarni commited on
Commit
ee08b0e
1 Parent(s): 8161987

Added usage code

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -25,12 +25,31 @@ This is a trained model of a **PPO** agent playing **LunarLander-v2**
25
  using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
26
 
27
  ## Usage (with Stable-baselines3)
28
- TODO: Add your code
29
-
30
-
31
  ```python
32
- from stable_baselines3 import ...
 
 
33
  from huggingface_sb3 import load_from_hub
34
 
35
- ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ```
 
25
  using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
26
 
27
  ## Usage (with Stable-baselines3)
 
 
 
28
  ```python
29
+ import gym
30
+ from stable_baselines3 import PPO
31
+ from stable_baselines3.common.evaluation import evaluate_policy
32
  from huggingface_sb3 import load_from_hub
33
 
34
+ repo_id = "yogeshkulkarni/ppo-LunarLander-v2" # The repo_id
35
+ filename = "ppo-LunarLander-v2.zip" # The model filename.zip
36
+
37
+ # When the model was trained on Python 3.8 the pickle protocol is 5
38
+ # But Python 3.6, 3.7 use protocol 4
39
+ # In order to get compatibility we need to:
40
+ # 1. Install pickle5 (we done it at the beginning of the colab)
41
+ # 2. Create a custom empty object we pass as parameter to PPO.load()
42
+ custom_objects = {
43
+ "learning_rate": 0.0,
44
+ "lr_schedule": lambda _: 0.0,
45
+ "clip_range": lambda _: 0.0,
46
+ }
47
+
48
+ checkpoint = load_from_hub(repo_id, filename)
49
+ model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)
50
+
51
+ # Evaluate this model
52
+ eval_env = gym.make("LunarLander-v2")
53
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
54
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
55
  ```