Antonio Serrano Muñoz commited on
Commit
3646a02
1 Parent(s): 2104d06

Add README

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: skrl
3
+ tags:
4
+ - deep-reinforcement-learning
5
+ - reinforcement-learning
6
+ - skrl
7
+ model-index:
8
+ - name: PPO
9
+ results:
10
+ - metrics:
11
+ - type: mean_reward
12
+ value: 9.1 +/- 0.05
13
+ name: Total reward (mean)
14
+ task:
15
+ type: reinforcement-learning
16
+ name: reinforcement-learning
17
+ dataset:
18
+ name: Isaac-Reach-Franka-v0
19
+ type: Isaac-Reach-Franka-v0
20
+ ---
21
+
22
+ # IsaacOrbit-Isaac-Reach-Franka-v0-PPO
23
+
24
+ Trained agent model for [NVIDIA Isaac Orbit](https://github.com/NVIDIA-Omniverse/Orbit) environment
25
+
26
+ - **Task:** Isaac-Reach-Franka-v0
27
+ - **Agent:** [PPO](https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html)
28
+
29
+ # Usage (with skrl)
30
+
31
+ ```python
32
+ from skrl.utils.huggingface import download_model_from_huggingface
33
+
34
+ # assuming that there is an agent named `agent`
35
+ path = download_model_from_huggingface("skrl/IsaacOrbit-Isaac-Reach-Franka-v0-PPO")
36
+ agent.load(path)
37
+ ```
38
+
39
+ # Hyperparameters
40
+
41
+ ```python
42
+ # https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html#configuration-and-hyperparameters
43
+ cfg_ppo["rollouts"] = 16 # memory_size
44
+ cfg_ppo["learning_epochs"] = 8
45
+ cfg_ppo["mini_batches"] = 8 # 16 * 2048 / 4096
46
+ cfg_ppo["discount_factor"] = 0.99
47
+ cfg_ppo["lambda"] = 0.95
48
+ cfg_ppo["learning_rate"] = 3e-4
49
+ cfg_ppo["learning_rate_scheduler"] = KLAdaptiveRL
50
+ cfg_ppo["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.008}
51
+ cfg_ppo["random_timesteps"] = 0
52
+ cfg_ppo["learning_starts"] = 0
53
+ cfg_ppo["grad_norm_clip"] = 1.0
54
+ cfg_ppo["ratio_clip"] = 0.2
55
+ cfg_ppo["value_clip"] = 0.2
56
+ cfg_ppo["clip_predicted_values"] = True
57
+ cfg_ppo["entropy_loss_scale"] = 0.0
58
+ cfg_ppo["value_loss_scale"] = 2.0
59
+ cfg_ppo["kl_threshold"] = 0
60
+ cfg_ppo["rewards_shaper"] = lambda rewards, timestep, timesteps: rewards * 0.01
61
+ cfg_ppo["state_preprocessor"] = RunningStandardScaler
62
+ cfg_ppo["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device}
63
+ cfg_ppo["value_preprocessor"] = RunningStandardScaler
64
+ cfg_ppo["value_preprocessor_kwargs"] = {"size": 1, "device": device}
65
+ # logging to TensorBoard and writing checkpoints
66
+ cfg_ppo["experiment"]["write_interval"] = 40
67
+ cfg_ppo["experiment"]["checkpoint_interval"] = 400
68
+ ```