kingabzpro commited on
Commit
53c82d9
1 Parent(s): 277e232

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -33
README.md CHANGED
@@ -24,42 +24,39 @@ model-index:
24
  This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
25
 
26
  ## Usage (with Stable-baselines3)
27
- ```python
28
- import gym
29
- from stable_baselines3 import PPO
30
- from stable_baselines3.common.evaluation import evaluate_policy
31
- from stable_baselines3.common.env_util import make_vec_env
32
-
33
- # Create a vectorized environment of 64 parallel environments
34
- env = make_vec_env("LunarLander-v2", n_envs=64)
35
-
36
- # Optimizaed Hyperparameters
37
- model = PPO(
38
- "MlpPolicy",
39
- env=env,
40
- n_steps=1024,
41
- batch_size=32,
42
- n_epochs=10,
43
- gamma=0.997,
44
- gae_lambda=0.98,
45
- ent_coef=0.01,
46
- verbose=1,
47
- )
48
 
49
- # Train it for 1,000,000 timesteps
50
- model.learn(total_timesteps=int(1e6))
 
 
 
 
 
 
51
 
52
- # Create a new environment for evaluation
53
- eval_env = gym.make("LunarLander-v2")
 
54
 
55
- # Evaluate the model with 10 evaluation episodes and deterministic=True
56
- mean_reward, std_reward = evaluate_policy(
57
- model, eval_env, n_eval_episodes=10, deterministic=True
58
- )
 
59
 
60
- # Print the results
 
 
61
  print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
62
-
63
- # >>> mean_reward=261.42 +/- 18.69168514436243
64
- ```
 
 
 
 
 
 
 
 
65
 
24
  This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
25
 
26
  ## Usage (with Stable-baselines3)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ Using this model becomes easy when you have stable-baselines3 and huggingface_sb3 installed:
29
+ ```
30
+ pip install stable-baselines3
31
+ pip install huggingface_sb3
32
+ ```
33
+ Then, you can use the model like this:
34
+ ```python
35
+ import gym
36
 
37
+ from huggingface_sb3 import load_from_hub
38
+ from stable_baselines3 import PPO
39
+ from stable_baselines3.common.evaluation import evaluate_policy
40
 
41
+ # Retrieve the model from the hub
42
+ ## repo_id = id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name})
43
+ ## filename = name of the model zip file from the repository
44
+ checkpoint = load_from_hub(repo_id="kingabzpro/Moonman-Lunar-Landing-v2", filename="Moonman-Lunar-Landing-v2.zip")
45
+ model = PPO.load(checkpoint)
46
 
47
+ # Evaluate the agent
48
+ eval_env = gym.make('LunarLander-v2')
49
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
50
  print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
51
+
52
+ # Watch the agent play
53
+ obs = eval_env.reset()
54
+ for i in range(1000):
55
+ action, _state = model.predict(obs)
56
+ obs, reward, done, info = eval_env.step(action)
57
+ eval_env.render()
58
+ if done:
59
+ obs = eval_env.reset()
60
+ eval_env.close()
61
+ ```
62