Weidong-Huang commited on
Commit
afa68ad
·
verified ·
1 Parent(s): aed1a8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md CHANGED
@@ -1,3 +1,85 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - safe reinforcement learning
5
+ - world model
6
+ - safety control
7
+ - model-based reinforcement learning
8
+ pipeline_tag: reinforcement-learning
9
  ---
10
+
11
+ # Model Card for SafeDreamer
12
+
13
+ Official release of SafeDreamer checkpoints for the paper
14
+
15
+ [SafeDreamer: Safe Reinforcement Learning with World Models](https://arxiv.org/abs/2307.07176) by
16
+
17
+ [Weidong Huang](https://scholar.google.com/citations?user=xF14hmcAAAAJ&hl=en)\*, [Jiaming Ji](https://jijiaming.netlify.app/)\*, Borong Zhang, Chunhe Xia, [Yaodong Yang](https://www.yangyaodong.com/)
18
+
19
+ **Quick links:** [[Website]](https://sites.google.com/view/safedreamer) [[Paper]](https://arxiv.org/abs/2307.07176)
20
+
21
+
22
+ ## Model Details
23
+
24
+ We open-source a total of 34 SafeDreamer model checkpoints. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models.
25
+
26
+
27
+ ### Model Description
28
+
29
+ - **Developed by:** [Weidong Huang](https://scholar.google.com/citations?user=xF14hmcAAAAJ&hl=en)
30
+ - **Model type:** SafeDreamer models trained on tasks from Safety-Gymnasium.
31
+ - **License:** apache-2.0.
32
+
33
+ ### Model Sources
34
+
35
+ - **Repository:** [https://github.com/PKU-Alignment/SafeDreamer](https://github.com/PKU-Alignment/SafeDreamer)
36
+ - **Paper:** [https://arxiv.org/abs/2307.07176](https://arxiv.org/abs/2307.07176)
37
+
38
+ ## Uses
39
+
40
+ Our SafeDreamer checkpoints represent one of the initial significant releases for safe reinforcement learning models. They offer wide-ranging possibilities. We believe these checkpoints will aid researchers in training, fine-tuning, evaluating, and studying models across the 20 safety control tasks we've provided models for. Yet, we also anticipate the community will find new ways to use these checkpoints.
41
+
42
+ ### Direct Use
43
+
44
+ You can load model checkpoints with the [official implementation](https://github.com/PKU-Alignment/SafeDreamer) to replicate our results or create paths for any task it supports.
45
+
46
+ ### Out-of-Scope Use
47
+
48
+ We anticipate that our model checkpoints, in their current form, will not generalize effectively to novel (unseen) tasks. Most likely, utilizing these models for specific target tasks will necessitate a degree of fine-tuning with relevant task data.
49
+
50
+ ## How to Get Started with the Models
51
+
52
+ See the [official implementation](https://github.com/PKU-Alignment/SafeDreamer) for how to install and examples of how to use it.
53
+
54
+ ### Training Procedure
55
+
56
+ We used the [official implementation](https://github.com/PKU-Alignment/SafeDreamer) with standard settings to train our checkpoints. While most models were trained until they stopped improving, a few were not. For a detailed look at how each model performed on different tasks, see the task-specific graphs in our [paper](https://arxiv.org/abs/2307.07176).
57
+
58
+ ## Environmental Impact
59
+
60
+ Carbon emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
61
+
62
+ - **Hardware Type:** NVIDIA GeForce RTX 3090
63
+ - **Hours used:** Approx. 50,000
64
+ - **Provider:** Private infrastructure
65
+ - **Carbon Emitted:** Approx. 7560 kg CO2eq
66
+
67
+ ## Citation
68
+
69
+ If you find our work useful, please consider citing the paper as follows:
70
+
71
+ **BibTeX:**
72
+ ```
73
+ @inproceedings{
74
+ safedreamer,
75
+ title={SafeDreamer: Safe Reinforcement Learning with World Models},
76
+ author={Weidong Huang and Jiaming Ji and Borong Zhang and Chunhe Xia and Yaodong Yang},
77
+ booktitle={The Twelfth International Conference on Learning Representations},
78
+ year={2024},
79
+ url={https://openreview.net/forum?id=tsE5HLYtYg}
80
+ }
81
+ ```
82
+
83
+ ## Contact
84
+
85
+ Correspondence to: [Weidong Huang](https://github.com/hdadong)