Initial commit

Browse files

Files changed (4) hide show

README.md +43 -38
model.pt +1 -1
replay.mp4 +0 -0
results.json +1 -1

README.md CHANGED Viewed

@@ -1,11 +1,13 @@
 ---
-library_name: sample-factory
 tags:
 - deep-reinforcement-learning
 - reinforcement-learning
-- sample-factory
 model-index:
-- name: APPO
   results:
   - task:
       type: reinforcement-learning
@@ -15,42 +17,45 @@ model-index:
       type: LunarLander-v2
     metrics:
     - type: mean_reward
-      value: 205.87 +/- 83.70
       name: mean_reward
       verified: false
 ---
-A(n) **APPO** model trained on the **LunarLander-v2** environment.
-This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory.
-Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
-## Downloading the model
-After installing Sample-Factory, download the model with:
-```
-python -m sample_factory.huggingface.load_from_hub -r NicolasYn/ppo8-LunarLander-v2
-```
-## Using the model
-To run the model after download, use the `enjoy` script corresponding to this environment:
-```
-python -m <path.to.enjoy.module> --algo=APPO --env=LunarLander-v2 --train_dir=./train_dir --experiment=ppo8-LunarLander-v2
-```
-You can also upload models to the Hugging Face Hub using the same script with the `--push_to_hub` flag.
-See https://www.samplefactory.dev/10-huggingface/huggingface/ for more details
-## Training with this model
-To continue training with this model, use the `train` script corresponding to this environment:
-```
-python -m <path.to.train.module> --algo=APPO --env=LunarLander-v2 --train_dir=./train_dir --experiment=ppo8-LunarLander-v2 --restart_behavior=resume --train_for_env_steps=10000000000
-```
-Note, you may have to adjust `--train_for_env_steps` to a suitably high number as the experiment will resume at the number of steps it concluded at.

 ---
 tags:
+- LunarLander-v2
+- ppo
 - deep-reinforcement-learning
 - reinforcement-learning
+- custom-implementation
+- deep-rl-course
 model-index:
+- name: PPO
   results:
   - task:
       type: reinforcement-learning
       type: LunarLander-v2
     metrics:
     - type: mean_reward
+      value: -179.93 +/- 111.55
       name: mean_reward
       verified: false
 ---
+  # PPO Agent Playing LunarLander-v2
+  This is a trained model of a PPO agent playing LunarLander-v2.
+  # Hyperparameters
+  ```python
+  {'exp_name': 'unit8_ppo1'
+'seed': 1
+'torch_deterministic': True
+'cuda': True
+'track': False
+'wandb_project_name': 'cleanRL'
+'wandb_entity': None
+'capture_video': False
+'env_id': 'LunarLander-v2'
+'total_timesteps': 50
+'learning_rate': 0.00025
+'num_envs': 4
+'num_steps': 128
+'anneal_lr': True
+'gae': True
+'gamma': 0.99
+'gae_lambda': 0.95
+'num_minibatches': 4
+'update_epochs': 4
+'norm_adv': True
+'clip_coef': 0.2
+'clip_vloss': True
+'ent_coef': 0.01
+'vf_coef': 0.5
+'max_grad_norm': 0.5
+'target_kl': None
+'repo_id': 'NicolasYn/ppo8-LunarLander-v2'
+'batch_size': 512
+'minibatch_size': 128}
+  ```

model.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3a495681ed0715e3e84e02ed2970c8ac4c7eba5781e4319e364a1f58d8afd5a5
 size 43026

 version https://git-lfs.github.com/spec/v1
+oid sha256:2a69b7bcd30a64cd0e3f59fb043b7222208fabadead1211be73df47a6003ac3e
 size 43026

replay.mp4 CHANGED Viewed

Binary files a/replay.mp4 and b/replay.mp4 differ

results.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"env_id": "LunarLander-v2", "mean_reward": -64.~~00075579713192~~, "std_reward": 19.~~541288934197084~~, "n_evaluation_episodes": 10, "eval_datetime": "2024-04-~~03T19~~:59:15.~~945122~~"}


1	+ {"env_id": "LunarLander-v2", "mean_reward": -179.93470284522232, "std_reward": 111.5488000959182, "n_evaluation_episodes": 10, "eval_datetime": "2024-04-06T20:12:44.691942"}