zjowowen commited on
Commit
fc70cd1
1 Parent(s): ad95118

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +369 -0
README.md ADDED
@@ -0,0 +1,369 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: pytorch
4
+ license: apache-2.0
5
+ pipeline_tag: reinforcement-learning
6
+ tags:
7
+ - reinforcement-learning
8
+ - Generative Model
9
+ - GenerativeRL
10
+ - LunarLanderContinuous-v2
11
+ benchmark_name: Box2d
12
+ task_name: LunarLanderContinuous-v2
13
+ model-index:
14
+ - name: QGPO
15
+ results:
16
+ - task:
17
+ type: reinforcement-learning
18
+ name: reinforcement-learning
19
+ dataset:
20
+ name: LunarLanderContinuous-v2
21
+ type: LunarLanderContinuous-v2
22
+ metrics:
23
+ - type: mean_reward
24
+ value: '200.0'
25
+ name: mean_reward
26
+ verified: false
27
+ ---
28
+
29
+ # Play **LunarLanderContinuous-v2** with **QGPO** Policy
30
+
31
+ ## Model Description
32
+ <!-- Provide a longer summary of what this model is. -->
33
+
34
+ This implementation applies **QGPO** to the Box2d **LunarLanderContinuous-v2** environment using [GenerativeRL](https://github.com/opendilab/di-engine).
35
+
36
+
37
+
38
+ ## Model Usage
39
+ ### Install the Dependencies
40
+ <details close>
41
+ <summary>(Click for Details)</summary>
42
+
43
+ ```shell
44
+ # install GenerativeRL with huggingface support
45
+ pip3 install GenerativeRL[huggingface]
46
+ # install environment dependencies if needed
47
+ pip3 install gym[box2d]==0.23.1
48
+ ```
49
+ </details>
50
+
51
+ ### Download Model from Huggingface and Run the Model
52
+
53
+ <details close>
54
+ <summary>(Click for Details)</summary>
55
+
56
+ ```shell
57
+ # running with trained model
58
+ python3 -u run.py
59
+ ```
60
+ **run.py**
61
+ ```python
62
+ import gym
63
+
64
+ from grl.algorithms.qgpo import QGPOAlgorithm
65
+ from grl.datasets import QGPOCustomizedTensorDictDataset
66
+
67
+ from grl.utils.huggingface import pull_model_from_hub
68
+
69
+
70
+ def qgpo_pipeline():
71
+
72
+ policy_state_dict, config = pull_model_from_hub(
73
+ repo_id="zjowowen/LunarLanderContinuous-v2-QGPO",
74
+ )
75
+
76
+ qgpo = QGPOAlgorithm(
77
+ config,
78
+ dataset=QGPOCustomizedTensorDictDataset(
79
+ numpy_data_path="./data.npz",
80
+ action_augment_num=config.train.parameter.action_augment_num,
81
+ ),
82
+ )
83
+
84
+ qgpo.model.load_state_dict(policy_state_dict)
85
+
86
+ # ---------------------------------------
87
+ # Customized train code ↓
88
+ # ---------------------------------------
89
+ # qgpo.train()
90
+ # ---------------------------------------
91
+ # Customized train code ↑
92
+ # ---------------------------------------
93
+
94
+ # ---------------------------------------
95
+ # Customized deploy code ↓
96
+ # ---------------------------------------
97
+ agent = qgpo.deploy()
98
+ env = gym.make(config.deploy.env.env_id)
99
+ observation = env.reset()
100
+ images = [env.render(mode="rgb_array")]
101
+ for _ in range(config.deploy.num_deploy_steps):
102
+ observation, reward, done, _ = env.step(agent.act(observation))
103
+ image = env.render(mode="rgb_array")
104
+ images.append(image)
105
+ # save images into mp4 files
106
+ import imageio.v3 as imageio
107
+ import numpy as np
108
+
109
+ images = np.array(images)
110
+ imageio.imwrite("replay.mp4", images, fps=30, quality=8)
111
+ # ---------------------------------------
112
+ # Customized deploy code ↑
113
+ # ---------------------------------------
114
+
115
+
116
+ if __name__ == "__main__":
117
+
118
+ qgpo_pipeline()
119
+
120
+ ```
121
+ </details>
122
+
123
+ ## Model Training
124
+
125
+ ### Train the Model and Push to Huggingface_hub
126
+
127
+ <details close>
128
+ <summary>(Click for Details)</summary>
129
+
130
+ ```shell
131
+ #Training Your Own Agent
132
+ python3 -u train.py
133
+ ```
134
+ **train.py**
135
+ ```python
136
+ import gym
137
+
138
+ from grl.algorithms.qgpo import QGPOAlgorithm
139
+ from grl.datasets import QGPOCustomizedTensorDictDataset
140
+ from grl.utils.log import log
141
+ from grl_pipelines.diffusion_model.configurations.lunarlander_continuous_qgpo import (
142
+ config,
143
+ )
144
+
145
+
146
+ def qgpo_pipeline(config):
147
+
148
+ qgpo = QGPOAlgorithm(
149
+ config,
150
+ dataset=QGPOCustomizedTensorDictDataset(
151
+ numpy_data_path="./data.npz",
152
+ action_augment_num=config.train.parameter.action_augment_num,
153
+ ),
154
+ )
155
+
156
+ # ---------------------------------------
157
+ # Customized train code ↓
158
+ # ---------------------------------------
159
+ qgpo.train()
160
+ # ---------------------------------------
161
+ # Customized train code ↑
162
+ # ---------------------------------------
163
+
164
+ # ---------------------------------------
165
+ # Customized deploy code ↓
166
+ # ---------------------------------------
167
+ agent = qgpo.deploy()
168
+ env = gym.make(config.deploy.env.env_id)
169
+ observation = env.reset()
170
+ for _ in range(config.deploy.num_deploy_steps):
171
+ env.render()
172
+ observation, reward, done, _ = env.step(agent.act(observation))
173
+ # ---------------------------------------
174
+ # Customized deploy code ↑
175
+ # ---------------------------------------
176
+
177
+
178
+ if __name__ == "__main__":
179
+ log.info("config: \n{}".format(config))
180
+ qgpo_pipeline(config)
181
+
182
+ ```
183
+ </details>
184
+
185
+ **Configuration**
186
+ <details close>
187
+ <summary>(Click for Details)</summary>
188
+
189
+
190
+ ```python
191
+ {'train': {'project': 'LunarLanderContinuous-v2-QGPO-VPSDE', 'device': 'cuda', 'wandb': {'project': 'IQL-LunarLanderContinuous-v2-QGPO-VPSDE'}, 'simulator': {'type': 'GymEnvSimulator', 'args': {'env_id': 'LunarLanderContinuous-v2'}}, 'model': {'QGPOPolicy': {'device': 'cuda', 'critic': {'device': 'cuda', 'q_alpha': 1.0, 'DoubleQNetwork': {'backbone': {'type': 'ConcatenateMLP', 'args': {'hidden_sizes': [10, 256, 256], 'output_size': 1, 'activation': 'relu'}}}}, 'diffusion_model': {'device': 'cuda', 'x_size': 2, 'alpha': 1.0, 'solver': {'type': 'DPMSolver', 'args': {'order': 2, 'device': 'cuda', 'steps': 17}}, 'path': {'type': 'linear_vp_sde', 'beta_0': 0.1, 'beta_1': 20.0}, 'reverse_path': {'type': 'linear_vp_sde', 'beta_0': 0.1, 'beta_1': 20.0}, 'model': {'type': 'noise_function', 'args': {'t_encoder': {'type': 'GaussianFourierProjectionTimeEncoder', 'args': {'embed_dim': 32, 'scale': 30.0}}, 'backbone': {'type': 'TemporalSpatialResidualNet', 'args': {'hidden_sizes': [512, 256, 128], 'output_dim': 2, 't_dim': 32, 'condition_dim': 8, 'condition_hidden_dim': 32, 't_condition_hidden_dim': 128}}}}, 'energy_guidance': {'t_encoder': {'type': 'GaussianFourierProjectionTimeEncoder', 'args': {'embed_dim': 32, 'scale': 30.0}}, 'backbone': {'type': 'ConcatenateMLP', 'args': {'hidden_sizes': [42, 256, 256], 'output_size': 1, 'activation': 'silu'}}}}}}, 'parameter': {'behaviour_policy': {'batch_size': 1024, 'learning_rate': 0.0001, 'epochs': 500}, 'action_augment_num': 16, 'fake_data_t_span': None, 'energy_guided_policy': {'batch_size': 256}, 'critic': {'stop_training_epochs': 500, 'learning_rate': 0.0001, 'discount_factor': 0.99, 'update_momentum': 0.005}, 'energy_guidance': {'epochs': 1000, 'learning_rate': 0.0001}, 'evaluation': {'evaluation_interval': 50, 'guidance_scale': [0.0, 1.0, 2.0]}, 'checkpoint_path': './LunarLanderContinuous-v2-QGPO'}}, 'deploy': {'device': 'cuda', 'env': {'env_id': 'LunarLanderContinuous-v2', 'seed': 0}, 'num_deploy_steps': 1000, 't_span': None}}
192
+ ```
193
+
194
+ ```json
195
+ {
196
+ "train": {
197
+ "project": "LunarLanderContinuous-v2-QGPO-VPSDE",
198
+ "device": "cuda",
199
+ "wandb": {
200
+ "project": "IQL-LunarLanderContinuous-v2-QGPO-VPSDE"
201
+ },
202
+ "simulator": {
203
+ "type": "GymEnvSimulator",
204
+ "args": {
205
+ "env_id": "LunarLanderContinuous-v2"
206
+ }
207
+ },
208
+ "model": {
209
+ "QGPOPolicy": {
210
+ "device": "cuda",
211
+ "critic": {
212
+ "device": "cuda",
213
+ "q_alpha": 1.0,
214
+ "DoubleQNetwork": {
215
+ "backbone": {
216
+ "type": "ConcatenateMLP",
217
+ "args": {
218
+ "hidden_sizes": [
219
+ 10,
220
+ 256,
221
+ 256
222
+ ],
223
+ "output_size": 1,
224
+ "activation": "relu"
225
+ }
226
+ }
227
+ }
228
+ },
229
+ "diffusion_model": {
230
+ "device": "cuda",
231
+ "x_size": 2,
232
+ "alpha": 1.0,
233
+ "solver": {
234
+ "type": "DPMSolver",
235
+ "args": {
236
+ "order": 2,
237
+ "device": "cuda",
238
+ "steps": 17
239
+ }
240
+ },
241
+ "path": {
242
+ "type": "linear_vp_sde",
243
+ "beta_0": 0.1,
244
+ "beta_1": 20.0
245
+ },
246
+ "reverse_path": {
247
+ "type": "linear_vp_sde",
248
+ "beta_0": 0.1,
249
+ "beta_1": 20.0
250
+ },
251
+ "model": {
252
+ "type": "noise_function",
253
+ "args": {
254
+ "t_encoder": {
255
+ "type": "GaussianFourierProjectionTimeEncoder",
256
+ "args": {
257
+ "embed_dim": 32,
258
+ "scale": 30.0
259
+ }
260
+ },
261
+ "backbone": {
262
+ "type": "TemporalSpatialResidualNet",
263
+ "args": {
264
+ "hidden_sizes": [
265
+ 512,
266
+ 256,
267
+ 128
268
+ ],
269
+ "output_dim": 2,
270
+ "t_dim": 32,
271
+ "condition_dim": 8,
272
+ "condition_hidden_dim": 32,
273
+ "t_condition_hidden_dim": 128
274
+ }
275
+ }
276
+ }
277
+ },
278
+ "energy_guidance": {
279
+ "t_encoder": {
280
+ "type": "GaussianFourierProjectionTimeEncoder",
281
+ "args": {
282
+ "embed_dim": 32,
283
+ "scale": 30.0
284
+ }
285
+ },
286
+ "backbone": {
287
+ "type": "ConcatenateMLP",
288
+ "args": {
289
+ "hidden_sizes": [
290
+ 42,
291
+ 256,
292
+ 256
293
+ ],
294
+ "output_size": 1,
295
+ "activation": "silu"
296
+ }
297
+ }
298
+ }
299
+ }
300
+ }
301
+ },
302
+ "parameter": {
303
+ "behaviour_policy": {
304
+ "batch_size": 1024,
305
+ "learning_rate": 0.0001,
306
+ "epochs": 500
307
+ },
308
+ "action_augment_num": 16,
309
+ "fake_data_t_span": null,
310
+ "energy_guided_policy": {
311
+ "batch_size": 256
312
+ },
313
+ "critic": {
314
+ "stop_training_epochs": 500,
315
+ "learning_rate": 0.0001,
316
+ "discount_factor": 0.99,
317
+ "update_momentum": 0.005
318
+ },
319
+ "energy_guidance": {
320
+ "epochs": 1000,
321
+ "learning_rate": 0.0001
322
+ },
323
+ "evaluation": {
324
+ "evaluation_interval": 50,
325
+ "guidance_scale": [
326
+ 0.0,
327
+ 1.0,
328
+ 2.0
329
+ ]
330
+ },
331
+ "checkpoint_path": "./LunarLanderContinuous-v2-QGPO"
332
+ }
333
+ },
334
+ "deploy": {
335
+ "device": "cuda",
336
+ "env": {
337
+ "env_id": "LunarLanderContinuous-v2",
338
+ "seed": 0
339
+ },
340
+ "num_deploy_steps": 1000,
341
+ "t_span": null
342
+ }
343
+ }
344
+ ```
345
+
346
+ </details>
347
+
348
+ **Training Procedure**
349
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
350
+ - **Weights & Biases (wandb):** [monitor link](https://wandb.ai/zjowowen/IQL-LunarLanderContinuous-v2-QGPO-VPSDE)
351
+
352
+ ## Model Information
353
+ <!-- Provide the basic links for the model. -->
354
+ - **Github Repository:** [repo link](https://github.com/opendilab/GenerativeRL/)
355
+ - **Doc**: [Algorithm link](https://opendilab.github.io/GenerativeRL/)
356
+ - **Configuration:** [config link](https://huggingface.co/OpenDILabCommunity/LunarLanderContinuous-v2-QGPO/blob/main/policy_config.json)
357
+ - **Demo:** [video](https://huggingface.co/OpenDILabCommunity/LunarLanderContinuous-v2-QGPO/blob/main/replay.mp4)
358
+ <!-- Provide the size information for the model. -->
359
+ - **Parameters total size:** 8799.79 KB
360
+ - **Last Update Date:** 2024-12-04
361
+
362
+ ## Environments
363
+ <!-- Address questions around what environment the model is intended to be trained and deployed at, including the necessary information needed to be provided for future users. -->
364
+ - **Benchmark:** Box2d
365
+ - **Task:** LunarLanderContinuous-v2
366
+ - **Gym version:** 0.23.1
367
+ - **GenerativeRL version:** v0.0.1
368
+ - **PyTorch version:** 2.4.1+cu121
369
+ - **Doc**: [Environments link](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)