File size: 6,167 Bytes
e30d4bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
pipeline_tag: text-to-image
inference: false
---
# Model summary
This Stable Diffusion Turbo model has been optimized to work with WebNN. This model is licensed under the [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE AGREEMENT](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE). For terms of use, please visit the [Acceptable Use Policy](https://stability.ai/use-policy). If you comply with the license and terms of use, you have the rights described therin. By using this Model, you accept the terms.
SD-Turbo-WebNN is meant to be used with the corresponding sample [here](https://microsoft.github.io/webnn-developer-preview/) for educational or testing purposes only.
# WebNN changes
This original model is [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo). SD-Turbo-WebNN is an ONNX version of the SD-Turbo model that optimizes for WebNN by using static input shapes and eliminates operators that are not in use.
# SD-Turbo Model Card
<!-- Provide a quick summary of what the model is/does. -->
SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.
We release SD-Turbo as a research artifact, and to study small, distilled text-to-image models. For increased quality and prompt understanding,
we recommend [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
Please note: For commercial use, please refer to https://stability.ai/membership.
## Model Details
### Model Description
SD-Turbo is a distilled version of [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1), trained for real-time synthesis.
SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the [technical report](https://stability.ai/research/adversarial-diffusion-distillation)), which allows sampling large-scale foundational
image diffusion models in 1 to 4 steps at high image quality.
This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an
adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.
- **Developed by:** Stability AI
- **Funded by:** Stability AI
- **Model type:** Generative text-to-image model
- **Finetuned from model:** [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1)
### Model Sources
For research purposes, we recommend our `generative-models` Github repository (https://github.com/Stability-AI/generative-models),
which implements the most popular diffusion frameworks (both training and inference).
- **Repository:** https://github.com/Stability-AI/generative-models
- **Paper:** https://stability.ai/research/adversarial-diffusion-distillation
- **Demo [for the bigger SDXL-Turbo]:** http://clipdrop.co/stable-diffusion-turbo
## Uses
### Direct Use
SD-Turbo is intended for both non-commercial and commercial usage. Possible research areas and tasks include
- Research on generative models.
- Research on real-time applications of generative models.
- Research on the impact of real-time generative models.
- Safe deployment of models which have the potential to generate harmful content.
- Probing and understanding the limitations and biases of generative models.
- Generation of artworks and use in design and other artistic processes.
- Applications in educational or creative tools.
For commercial use, please refer to https://stability.ai/membership.
Excluded uses are described below.
### Diffusers
```
pip install diffusers transformers accelerate --upgrade
```
- **Text-to-image**:
SD-Turbo does not make use of `guidance_scale` or `negative_prompt`, we disable it with `guidance_scale=0.0`.
Preferably, the model generates images of size 512x512 but higher image sizes work as well.
A **single step** is enough to generate high quality images.
```py
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
```
- **Image-to-image**:
When using SD-Turbo for image-to-image generation, make sure that `num_inference_steps` * `strength` is larger or equal
to 1. The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, *e.g.* 0.5 * 2.0 = 1 step in our example
below.
```py
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
```
### Out-of-Scope Use
The model was not trained to be factual or true representations of people or events,
and therefore using the model to generate such content is out-of-scope for the abilities of this model.
The model should not be used in any way that violates Stability AI's [Acceptable Use Policy](https://stability.ai/use-policy).
## Limitations and Bias
### Limitations
- The quality and prompt alignment is lower than that of [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
- The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
- The model cannot render legible text.
- Faces and people in general may not be generated properly.
- The autoencoding part of the model is lossy.
### Recommendations
The model is intended for both non-commercial and commercial usage.
## How to Get Started with the Model
Check out https://github.com/Stability-AI/generative-models
|