rromb commited on
Commit
ebd2ba0
1 Parent(s): 0d31a87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -1,3 +1,117 @@
1
  ---
2
  license: creativeml-openrail-m
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
  ---
4
+ # SD-XL 1.0-refiner Model Card
5
+ ![row01](01.png)
6
+
7
+ ## Model
8
+
9
+ ![pipeline](pipeline.png)
10
+
11
+ [SDXL](https://arxiv.org/abs/2307.01952) consists of a mixture-of-experts pipeline for latent diffusion:
12
+ In a first step, the base model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is used to generate (noisy) latents,
13
+ which are then further processed with a refinement model specialized for the final denoising steps.
14
+ Note that the base model can be used as a standalone module.
15
+
16
+ Alternatively, we can use a two-stage pipeline as follows:
17
+ First, the base model is used to generate latents of the desired output size.
18
+ In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img")
19
+ to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations.
20
+
21
+ Source code is available at https://github.com/Stability-AI/generative-models .
22
+
23
+ ### Model Description
24
+
25
+ - **Developed by:** Stability AI
26
+ - **Model type:** Diffusion-based text-to-image generative model
27
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
28
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses two fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)).
29
+ - **Resources for more information:** Check out our [GitHub Repository](https://github.com/Stability-AI/generative-models) and the [SDXL report on arXiv](https://arxiv.org/abs/2307.01952).
30
+
31
+ ### Model Sources
32
+
33
+ For research purposes, we recommned our `generative-models` Github repository (https://github.com/Stability-AI/generative-models), which implements the most popoular diffusion frameworks (both training and inference) and for which new functionalities like distillation will be added over time.
34
+ [Clipdrop](https://clipdrop.co/stable-diffusion) provides free SDXL inference.
35
+
36
+ - **Repository:** https://github.com/Stability-AI/generative-models
37
+ - **Demo:** https://clipdrop.co/stable-diffusion
38
+
39
+
40
+ ## Evaluation
41
+ ![comparison](comparison.png)
42
+ The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1.
43
+ The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.
44
+
45
+
46
+ ### 🧨 Diffusers
47
+
48
+ Make sure to upgrade diffusers to >= 0.18.0:
49
+ ```
50
+ pip install diffusers --upgrade
51
+ ```
52
+
53
+ In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
54
+ ```
55
+ pip install invisible_watermark transformers accelerate safetensors
56
+ ```
57
+
58
+ You can use the model then as follows
59
+ ```py
60
+ from diffusers import DiffusionPipeline
61
+ import torch
62
+
63
+ pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
64
+ pipe.to("cuda")
65
+
66
+ # if using torch < 2.0
67
+ # pipe.enable_xformers_memory_efficient_attention()
68
+
69
+ prompt = "An astronaut riding a green horse"
70
+
71
+ images = pipe(prompt=prompt).images[0]
72
+ ```
73
+
74
+ When using `torch >= 2.0`, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:
75
+ ```py
76
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
77
+ ```
78
+
79
+ If you are limited by GPU VRAM, you can enable *cpu offloading* by calling `pipe.enable_model_cpu_offload`
80
+ instead of `.to("cuda")`:
81
+
82
+ ```diff
83
+ - pipe.to("cuda")
84
+ + pipe.enable_model_cpu_offload()
85
+ ```
86
+
87
+
88
+ ## Uses
89
+
90
+ ### Direct Use
91
+
92
+ The model is intended for research purposes only. Possible research areas and tasks include
93
+
94
+ - Generation of artworks and use in design and other artistic processes.
95
+ - Applications in educational or creative tools.
96
+ - Research on generative models.
97
+ - Safe deployment of models which have the potential to generate harmful content.
98
+ - Probing and understanding the limitations and biases of generative models.
99
+
100
+ Excluded uses are described below.
101
+
102
+ ### Out-of-Scope Use
103
+
104
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
105
+
106
+ ## Limitations and Bias
107
+
108
+ ### Limitations
109
+
110
+ - The model does not achieve perfect photorealism
111
+ - The model cannot render legible text
112
+ - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
113
+ - Faces and people in general may not be generated properly.
114
+ - The autoencoding part of the model is lossy.
115
+
116
+ ### Bias
117
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.