--- license: apache-2.0 tags: - safe reinforcement learning - world model - safety control - model-based reinforcement learning pipeline_tag: reinforcement-learning --- # Model Card for SafeDreamer Official release of SafeDreamer checkpoints for the paper [SafeDreamer: Safe Reinforcement Learning with World Models](https://arxiv.org/abs/2307.07176) by [Weidong Huang](https://scholar.google.com/citations?user=xF14hmcAAAAJ&hl=en)\*, [Jiaming Ji](https://jijiaming.netlify.app/)\*, Borong Zhang, Chunhe Xia, [Yaodong Yang](https://www.yangyaodong.com/) **Quick links:** [[Website]](https://sites.google.com/view/safedreamer) [[Paper]](https://arxiv.org/abs/2307.07176) ## Model Details We open-source a total of 34 SafeDreamer model checkpoints. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models. ### Model Description - **Developed by:** [Weidong Huang](https://scholar.google.com/citations?user=xF14hmcAAAAJ&hl=en) - **Model type:** SafeDreamer models trained on tasks from Safety-Gymnasium. - **License:** apache-2.0. ### Model Sources - **Repository:** [https://github.com/PKU-Alignment/SafeDreamer](https://github.com/PKU-Alignment/SafeDreamer) - **Paper:** [https://arxiv.org/abs/2307.07176](https://arxiv.org/abs/2307.07176) ## Uses Our SafeDreamer checkpoints represent one of the initial significant releases for safe reinforcement learning models. They offer wide-ranging possibilities. We believe these checkpoints will aid researchers in training, fine-tuning, evaluating, and studying models across the 20 safety control tasks we've provided models for. Yet, we also anticipate the community will find new ways to use these checkpoints. ### Direct Use You can load model checkpoints with the [official implementation](https://github.com/PKU-Alignment/SafeDreamer) to replicate our results or create paths for any task it supports. ### Out-of-Scope Use We anticipate that our model checkpoints, in their current form, will not generalize effectively to novel (unseen) tasks. Most likely, utilizing these models for specific target tasks will necessitate a degree of fine-tuning with relevant task data. ## How to Get Started with the Models See the [official implementation](https://github.com/PKU-Alignment/SafeDreamer) for how to install and examples of how to use it. ### Training Procedure We used the [official implementation](https://github.com/PKU-Alignment/SafeDreamer) with standard settings to train our checkpoints. While most models were trained until they stopped improving, a few were not. For a detailed look at how each model performed on different tasks, see the task-specific graphs in our [paper](https://arxiv.org/abs/2307.07176). ## Environmental Impact Carbon emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** NVIDIA GeForce RTX 3090 - **Hours used:** Approx. 50,000 - **Provider:** Private infrastructure - **Carbon Emitted:** Approx. 7560 kg CO2eq ## Citation If you find our work useful, please consider citing the paper as follows: **BibTeX:** ``` @inproceedings{ safedreamer, title={SafeDreamer: Safe Reinforcement Learning with World Models}, author={Weidong Huang and Jiaming Ji and Borong Zhang and Chunhe Xia and Yaodong Yang}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=tsE5HLYtYg} } ``` ## Contact Correspondence to: [Weidong Huang](https://github.com/hdadong)