Model Card for SafeDreamer

Official release of SafeDreamer checkpoints for the paper

SafeDreamer: Safe Reinforcement Learning with World Models by

Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, Yaodong Yang

Model Details

We open-source a total of 80+ SafeDreamer model checkpoints. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models. The model is named following the format: date_algorithm_environment_task_seedId, such as 20240228-145125_bsrp_lag_safetygym_SafetyRacecarButton1-v0_0.ckpt.

Model Description

Developed by: Weidong Huang
Model type: SafeDreamer models trained on tasks from Safety-Gymnasium.
License: apache-2.0.

Model Sources

Repository: https://github.com/PKU-Alignment/SafeDreamer
Paper: https://arxiv.org/abs/2307.07176

Uses

Our SafeDreamer checkpoints represent one of the initial significant releases for safe reinforcement learning models. They offer wide-ranging possibilities. We believe these checkpoints will aid researchers in training, fine-tuning, evaluating, and studying models across the 20 safety control tasks we've provided models for. Yet, we also anticipate the community will find new ways to use these checkpoints.

Direct Use

You can load model checkpoints with the official implementation to replicate our results or create paths for any task it supports.

Out-of-Scope Use

We anticipate that our model checkpoints, in their current form, will not generalize effectively to novel (unseen) tasks. Most likely, utilizing these models for specific target tasks will necessitate a degree of fine-tuning with relevant task data.

How to Get Started with the Models

See the official implementation for how to install and examples of how to use it.

Training Procedure

We used the official implementation with standard settings to train our checkpoints. While most models were trained until they stopped improving, a few were not. For a detailed look at how each model performed on different tasks, see the task-specific graphs in our paper.

Environmental Impact

Carbon emissions are estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: NVIDIA GeForce RTX 3090
Hours used: Approx. 50,000
Provider: Private infrastructure
Carbon Emitted: Approx. 7560 kg CO2eq

Citation

If you find our work useful, please consider citing the paper as follows:

BibTeX:

@inproceedings{
safedreamer,
title={SafeDreamer: Safe Reinforcement Learning with World Models},
author={Weidong Huang and Jiaming Ji and Borong Zhang and Chunhe Xia and Yaodong Yang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=tsE5HLYtYg}
}

Contact

Correspondence to: Weidong Huang