File size: 4,131 Bytes
0d86aff 038ccef 9a23564 99d8e43 0d86aff 02bf629 fab7d4a 02bf629 d378f9d 9c5c92f fab7d4a 78f34d9 02bf629 fab7d4a 02bf629 d378f9d 02bf629 fab7d4a f96ba45 9f79fad 934e26b 9f79fad 934e26b fab7d4a f96ba45 fab7d4a d378f9d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
license: other
tags:
- stable-diffusion
- text-to-image
inference: false
---
# Stable Diffusion
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
This model card gives an overview of all available model checkpoints. For more in-detail model cards, please have a look at the model repositories listed under [Model Access](#model-access).
## Stable Diffusion Version 1
For the first version 4 model checkpoints are released.
*Higher* versions have been trained for longer and are thus usually better in terms of image generation quality then *lower* versions. More specifically:
- **stable-diffusion-v1-1**: The checkpoint is randomely initialized and has been trained on 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- **stable-diffusion-v1-2** (https://huggingface.co/CompVis/stable-diffusion-v1-2): The checkpoint is resumed training from `stable-diffusion-v1-1`.
515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- **stable-diffusion-v1-3** (https://huggingface.co/CompVis/stable-diffusion-v1-3): The checkpoint is resumed training from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
- **stable-diffusion-v1-4** (https://huggingface.co/CompVis/stable-diffusion-v1-4) The checkpoint is resumed training.
### Model Access
Each checkpoint can be used both with Hugging Face's [ D🧨ffusers library](https://github.com/huggingface/diffusers) or the original [Stable Diffusion GitHub repository](https://github.com/CompVis/stable-diffusion). Note that you have to *"click-request"* them on each respective model repository.
| **[🤗's D🧨ffusers library](https://github.com/huggingface/diffusers)** | **[Stable Diffusion GitHub repository](https://github.com/CompVis/stable-diffusion)** |
| ----------- | ----------- |
| [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1) | [`stable-diffusion-v-1-1-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-1-original) |
| [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2) | [`stable-diffusion-v-1-2-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-2-original) |
| [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3) | [`stable-diffusion-v-1-3-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original) |
| [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) | [`stable-diffusion-v-1-4-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) |
### Demo
To quickly try out the model, you can try out the [(TODO) Stable Diffusion Space]( ).
## Citation
```bibtex
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
```
*This model card was written by: Robin Rombach and Patrick Esser and is based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*
|