MorganWKen
/

ppo-LunarLander-v3

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Model card Files Files and versions Community

MorganWKen commited on May 5, 2024

Commit

c1dd658

·

verified ·

1 Parent(s): 46390c1

Cleaning up

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -30,9 +30,6 @@ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines
 ```python
-from stable_baselines3 import ...
-from huggingface_sb3 import load_from_hub
 # Defining model
 model = PPO('MlpPolicy', env, n_steps = 512, batch_size = 64, n_epochs = 4, gamma = 0.999, gae_lambda = 0.98, ent_coef = 0.01, verbose=1)
@@ -53,7 +50,11 @@ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, d
 # Print the results
 print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
-# 284.84514090000005, "std_reward": 18.270698037778157
 # mean_reward=284.85 +/- 18.270698037778157
 ...
 ```

 ```python
 # Defining model
 model = PPO('MlpPolicy', env, n_steps = 512, batch_size = 64, n_epochs = 4, gamma = 0.999, gae_lambda = 0.98, ent_coef = 0.01, verbose=1)
 # Print the results
 print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
 # mean_reward=284.85 +/- 18.270698037778157
 ...
 ```
+## Diffs
+* Dropped `n_steps` down to 512
+* Bumped `total_timestamps` up to 2,000,000