valhalla commited on
Commit
842db93
1 Parent(s): 18fa135

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -1
README.md CHANGED
@@ -9,4 +9,79 @@ tags:
9
  - diffusers
10
  - inpainting
11
  inference: false
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - diffusers
10
  - inpainting
11
  inference: false
12
+ ---
13
+
14
+ # SD-XL Inpainting 0.1 Model Card
15
+
16
+ SD-XL Inpainting 0.1 is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
17
+
18
+ The SD-XL Inpainting 0.1 was initialized with the `stable-diffusion-xl-base-1.0` weights. The model is trained for 40k steps at resolution 1024x1024 and 5% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and, in 25% mask everything.
19
+
20
+
21
+ ## How to use
22
+
23
+ ```python3
24
+ from diffusers import AutoPipelineForInpainting
25
+ from diffusers.utils import load_image
26
+ import torch
27
+
28
+ pipe = AutoPipelineForInpainting.from_pretrained("invokeai-diffusers/stable-diffusion-xl-1.0-inpaint", torch_dtype=torch.float16, variant="fp16").to("cuda")
29
+
30
+ img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
31
+ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
32
+
33
+ image = load_image(img_url).resize((1024, 1024))
34
+ mask_image = load_image(mask_url).resize((1024, 1024))
35
+
36
+ prompt = "a tiger sitting on a park bench"
37
+ generator = torch.Generator(device="cuda").manual_seed(0)
38
+
39
+ image = pipe(
40
+ prompt=prompt,
41
+ image=image,
42
+ mask_image=mask_image,
43
+ guidance_scale=8.0,
44
+ num_inference_steps=20, # steps between 15 and 30 work well for us
45
+ strength=0.99, # make sure to use `strength` below 1.0
46
+ generator=generator,
47
+ ).images[0]
48
+ ```
49
+
50
+ ## Model Description
51
+
52
+ - **Developed by:** The Diffusers team
53
+ - **Model type:** Diffusion-based text-to-image generative model
54
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
55
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses two fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)).
56
+
57
+
58
+ ## Uses
59
+
60
+ ### Direct Use
61
+
62
+ The model is intended for research purposes only. Possible research areas and tasks include
63
+
64
+ - Generation of artworks and use in design and other artistic processes.
65
+ - Applications in educational or creative tools.
66
+ - Research on generative models.
67
+ - Safe deployment of models which have the potential to generate harmful content.
68
+ - Probing and understanding the limitations and biases of generative models.
69
+
70
+ Excluded uses are described below.
71
+
72
+ ### Out-of-Scope Use
73
+
74
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
75
+
76
+ ## Limitations and Bias
77
+
78
+ ### Limitations
79
+
80
+ - The model does not achieve perfect photorealism
81
+ - The model cannot render legible text
82
+ - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
83
+ - Faces and people in general may not be generated properly.
84
+ - The autoencoding part of the model is lossy.
85
+
86
+ ### Bias
87
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.