shivakanthsujit commited on
Commit
d08edee
1 Parent(s): ab81668

Upload with huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +32 -0
  2. config.yaml +106 -0
  3. model/env_stats.pickle +3 -0
  4. model/model.pth +3 -0
  5. replay.mp4 +0 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mbrl-lib
3
+ tags:
4
+ - mbrl-Hopper-v2
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - mbrl-lib
8
+ model-index:
9
+ - name: OneDTransitionRewardModel w/ SACAgent
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: mbrl-Hopper-v2
16
+ type: mbrl-Hopper-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 87.70 +/- 1.10
20
+ name: mean_reward
21
+ verified: false
22
+ ---
23
+ # **OneDTransitionRewardModel w/ SACAgent** Agent playing **mbrl-Hopper-v2**
24
+ This is a trained model of a **OneDTransitionRewardModel w/ SACAgent** agent playing **mbrl-Hopper-v2**
25
+ using [MBRL-Lib](https://github.com/facebookresearch/mbrl-lib).
26
+
27
+ ## Usage (with MBRL-Lib)
28
+ TODO: Add your code
29
+ ```python
30
+ from mbrl import ...
31
+ ...
32
+ ```
config.yaml ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ action_optimizer:
2
+ _target_: mbrl.planning.CEMOptimizer
3
+ alpha: 0.1
4
+ clipped_normal: false
5
+ device: cpu
6
+ elite_ratio: 0.1
7
+ lower_bound: ???
8
+ num_iterations: 5
9
+ population_size: 350
10
+ return_mean_elites: true
11
+ upper_bound: ???
12
+ algorithm:
13
+ agent:
14
+ _target_: mbrl.third_party.pytorch_sac_pranz24.sac.SAC
15
+ action_space:
16
+ _target_: gym.env.Box
17
+ high:
18
+ - 1.0
19
+ - 1.0
20
+ - 1.0
21
+ low:
22
+ - -1.0
23
+ - -1.0
24
+ - -1.0
25
+ shape:
26
+ - 3
27
+ args:
28
+ alpha: 0.2
29
+ automatic_entropy_tuning: false
30
+ device: cpu
31
+ gamma: 0.99
32
+ hidden_size: 512
33
+ lr: 0.0003
34
+ policy: Gaussian
35
+ target_entropy: 1
36
+ target_update_interval: 4
37
+ tau: 0.005
38
+ num_inputs: 11
39
+ freq_train_model: 250
40
+ initial_exploration_steps: 5000
41
+ learned_rewards: true
42
+ name: mbpo
43
+ normalize: true
44
+ normalize_double_precision: true
45
+ num_eval_episodes: 1
46
+ random_initial_explore: false
47
+ real_data_ratio: 0.0
48
+ sac_samples_action: true
49
+ target_is_delta: true
50
+ debug_mode: false
51
+ device: cpu
52
+ dynamics_model:
53
+ _target_: mbrl.models.GaussianMLP
54
+ activation_fn_cfg:
55
+ _target_: torch.nn.SiLU
56
+ deterministic: false
57
+ device: cpu
58
+ ensemble_size: 7
59
+ hid_size: 200
60
+ in_size: 14
61
+ learn_logvar_bounds: false
62
+ num_layers: 4
63
+ out_size: 12
64
+ propagation_method: random_model
65
+ experiment: default
66
+ log_frequency_agent: 1000
67
+ overrides:
68
+ cem_alpha: 0.1
69
+ cem_clipped_normal: false
70
+ cem_elite_ratio: 0.1
71
+ cem_num_iters: 5
72
+ cem_population_size: 350
73
+ effective_model_rollouts_per_step: 400
74
+ env: gym___Hopper-v2
75
+ epoch_length: 1000
76
+ freq_train_model: 250
77
+ model_batch_size: 256
78
+ model_lr: 0.001
79
+ model_wd: 1.0e-05
80
+ num_elites: 5
81
+ num_epochs_to_retain_sac_buffer: 1
82
+ num_sac_updates_per_step: 40
83
+ num_steps: 125000
84
+ patience: 5
85
+ planning_horizon: 15
86
+ rollout_schedule:
87
+ - 20
88
+ - 150
89
+ - 1
90
+ - 15
91
+ sac_alpha: 0.2
92
+ sac_automatic_entropy_tuning: false
93
+ sac_batch_size: 256
94
+ sac_gamma: 0.99
95
+ sac_hidden_size: 512
96
+ sac_lr: 0.0003
97
+ sac_policy: Gaussian
98
+ sac_target_entropy: 1
99
+ sac_target_update_interval: 4
100
+ sac_tau: 0.005
101
+ sac_updates_every_steps: 1
102
+ term_fn: hopper
103
+ validation_ratio: 0.2
104
+ root_dir: ./logs
105
+ save_video: false
106
+ seed: 0
model/env_stats.pickle ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e21598388c4adb0e0782789953c23e9e74ca9e136f19578b67ef7413b699755
3
+ size 422
model/model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfad6ff5aa75a76004692d8b49f1027d8dafb6cc43b248b6c7f48e3c6189925b
3
+ size 3599717
replay.mp4 ADDED
Binary file (27.7 kB). View file