File size: 9,843 Bytes
7febc8f
 
 
 
 
 
 
 
 
 
1659e0c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
---
title: sdui
emoji: 🐢
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
duplicated_from: atikur-rabbi/sdui
---

# Stable Diffusion in Docker

Run the official [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion)
releases on [Huggingface](https://huggingface.co/) in a GPU accelerated Docker
container.

```sh
./build.sh run 'An impressionist painting of a parakeet eating spaghetti in the desert'
```

![An impressionist painting of a parakeet eating spaghetti in the desert 1](https://raw.githubusercontent.com/fboulnois/repository-assets/main/assets/stable-diffusion-docker/An_impressionist_painting_of_a_parakeet_eating_spaghetti_in_the_desert_s1.png)
![An impressionist painting of a parakeet eating spaghetti in the desert 2](https://raw.githubusercontent.com/fboulnois/repository-assets/main/assets/stable-diffusion-docker/An_impressionist_painting_of_a_parakeet_eating_spaghetti_in_the_desert_s2.png)

```sh
./build.sh run --image parakeet_eating_spaghetti.png --strength 0.6 'Bouquet of roses'
```

![Bouquet of roses 1](https://raw.githubusercontent.com/fboulnois/repository-assets/main/assets/stable-diffusion-docker/Bouquet_of_roses_s1.png)
![Bouquet of roses 2](https://raw.githubusercontent.com/fboulnois/repository-assets/main/assets/stable-diffusion-docker/Bouquet_of_roses_s2.png)

## Before you start

### Minimum requirements

By default, the pipeline uses the full model and weights which requires a CUDA
capable GPU with 8GB+ of VRAM. It should take a few seconds to create one image.
On less powerful GPUs you may need to modify some of the options; see the
[Examples](#examples) section for more details. If you lack a suitable GPU you
can set the options `--device cpu` and `--onnx` instead.

### Huggingface token

Since it uses the official model, you will need to create a [user access token](https://huggingface.co/docs/hub/security-tokens)
in your [Huggingface account](https://huggingface.co/settings/tokens). Save the
user access token in a file called `token.txt` and make sure it is available
when building the container. The token content should begin with `hf_...`

## Quickstart

The pipeline is managed using a single [`build.sh`](build.sh) script.

Pull the latest version of `stable-diffusion-docker` using `./build.sh pull`.
You will need to use the option `--token` to specify a valid [user access token](#huggingface-token)
when using [`./build run`](#run).

Alternately, build the image locally before running it.

## Build

Make sure your [user access token](#huggingface-token) is saved in a file called
`token.txt`.

To build:

```sh
./build.sh build  # or just ./build.sh
```

## Run

### Text-to-Image (`txt2img`)

Create an image from a text prompt.

To run:

```sh
./build.sh run 'Andromeda galaxy in a bottle'
```

### Image-to-Image (`img2img`)

Create an image from an existing image and a text prompt.

First, copy an image to the `input` folder. Next, to run:

```sh
./build.sh run --image image.png 'Andromeda galaxy in a bottle'
```

### Depth-Guided Diffusion (`depth2img`)

Modify an existing image with its depth map and a text prompt.

First, copy an image to the `input` folder. Next, to run:

```sh
./build.sh run --model 'stabilityai/stable-diffusion-2-depth' \
  --image image.png 'A detailed description of the objects to change'
```

### Instruct Pix2Pix (`pix2pix`)

Modify an existing image with a text prompt.

First, copy an image to the `input` folder. Next, to run:

```sh
./build.sh run --model 'timbrooks/instruct-pix2pix' \
  --image image.png 'A detailed description of the objects to change'
```

### Image Upscaling (`upscale4x`)

Create a high resolution image from an existing image with a text prompt.

First, copy an image to the `input` folder. Next, to run:

```sh
./build.sh run --model 'stabilityai/stable-diffusion-x4-upscaler' \
  --image image.png 'Andromeda galaxy in a bottle'
```

### Diffusion Inpainting (`inpaint`)

Modify specific areas of an existing image with an image mask and a text prompt.

First, copy an image and an image mask to the `input` folder. White areas of the
mask will be diffused and black areas will be kept untouched. Next, to run:

```sh
./build.sh run --model 'runwayml/stable-diffusion-inpainting' \
  --image image.png --mask mask.png 'Andromeda galaxy in a bottle'
```

## Options

The following are the most common options:

* `--prompt [PROMPT]`: the prompt to render into an image
* `--model [MODEL]`: the model used to render images (default is
`CompVis/stable-diffusion-v1-4`)
* `--height [HEIGHT]`: image height in pixels (default 512, must be divisible by 64)
* `--width [WIDTH]`: image width in pixels (default 512, must be divisible by 64)
* `--iters [ITERS]`: number of times to run pipeline (default 1)
* `--samples [SAMPLES]`: number of images to create per run (default 1)
* `--scale [SCALE]`: how closely the image should follow the prompt (default 7.5)
* `--scheduler [SCHEDULER]`: override the scheduler used to denoise the image
(default `None`)
* `--seed [SEED]`: RNG seed for repeatability (default is a random seed)
* `--steps [STEPS]`: number of sampling steps (default 50)

Other options:

* `--attention-slicing`: use less memory but decrease inference speed (default
is no attention slicing)
* `--device [DEVICE]`: the cpu or cuda device to use to render images (default
`cuda`)
* `--half`: use float16 tensors instead of float32 (default `float32`)
* `--image [IMAGE]`: the input image to use for image-to-image diffusion
(default `None`)
* `--image-scale [IMAGE_SCALE]`: how closely the image should follow the
original image (default `None`)
* `--mask [MASK]`: the input mask to use for diffusion inpainting (default
`None`)
* `--negative-prompt [NEGATIVE_PROMPT]`: the prompt to not render into an image
(default `None`)
* `--onnx`: use the onnx runtime for inference (default is off)
* `--skip`: skip safety checker (default is the safety checker is on)
* `--strength [STRENGTH]`: diffusion strength to apply to the input image
(default 0.75)
* `--token [TOKEN]`: specify a Huggingface user access token at the command line
instead of reading it from a file (default is a file)
* `--vae-tiling`: use less memory when generating ultra-high resolution images
but massively decrease inference speed (default is no tiling)
* `--xformers-memory-efficient-attention`: use less memory but require the
xformers library (default is that xformers is not required)

Some of the original `txt2img.py` options [have been renamed](https://github.com/fboulnois/stable-diffusion-docker/issues/49)
for easy-of-use and compatibility with other pipelines:

| txt2img | stable-diffusion-docker |
|---------|-------------------------|
| `--H` | `--height` |
| `--W` | `--width` |
| `--n_iter` | `--iters` |
| `--n_samples` | `--samples` |
| `--ddim_steps` | `--steps` |

## Examples

These commands are both identical:

```sh
./build.sh run 'abstract art'
./build.sh run --prompt 'abstract art'
```

Set the seed to 42:

```sh
./build.sh run --seed 42 'abstract art'
```

Options can be combined:

```sh
./build.sh run --scale 7.0 --seed 42 'abstract art'
```

Many popular models are supported out-of-the-box:

| Model Name | Option using `--model` |
|------------|------------------------|
| [Stable Diffusion 1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) | `'CompVis/stable-diffusion-v1-4'` |
| [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) | `'runwayml/stable-diffusion-v1-5'` |
| [Stable Diffusion 2.0](https://huggingface.co/stabilityai/stable-diffusion-2) | `'stabilityai/stable-diffusion-2'` |
| [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) | `'stabilityai/stable-diffusion-2-1'` |
| [OpenJourney 1.0](https://huggingface.co/prompthero/openjourney) | `'prompthero/openjourney'` |
| [Dreamlike Diffusion 1.0](https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0) | `'dreamlike-art/dreamlike-diffusion-1.0'` |
| [and more!](https://huggingface.co/models?other=stable-diffusion&sort=likes) | ... |

```sh
./build.sh run --model 'prompthero/openjourney' --prompt 'abstract art'
```

On systems without enough GPU VRAM, you can try mixing and matching options:

* Give Docker Desktop more resources by increasing the CPU, memory, and swap in
the Settings -> Resources section if the container is terminated
* Make images smaller than 512x512 using `--height` and `--width` to decrease
memory use and increase image creation speed
* Use `--half` to decrease memory use but slightly decrease image quality
* Use `--attention-slicing` to decrease memory use but also decrease image
creation speed
* Use `--xformers-memory-efficient-attention` to decrease memory use if the
pipeline and the hardware supports the option
* Decrease the number of samples and increase the number of iterations with
`--samples` and `--iters` to decrease overall memory use
* Skip the safety checker with `--skip` to run less code

```sh
./build.sh run --height 256 --width 256 --half \
  --attention-slicing --xformers-memory-efficient-attention \
  --samples 1 --iters 1 --skip --prompt 'abstract art'
```

On Windows, if you aren't using WSL2 and instead use MSYS, MinGW, or Git Bash,
prefix your commands with `MSYS_NO_PATHCONV=1` (or export it beforehand):

```sh
MSYS_NO_PATHCONV=1 ./build.sh run --half --prompt 'abstract art'
```

## Outputs

### Model

The model and other files are cached in a volume called `huggingface`. The
models are stored in `<volume>/diffusers/<model>/snapshots/<githash>/unet/<weights>`.
Checkpoint files (`ckpt`s) are unofficial versions of the official models, and
so these are not part of the official release.

### Images

The images are saved as PNGs in the `output` folder using the prompt text. The
`build.sh` script creates and mounts this folder as a volume in the container.