ljp commited on
Commit
ae10b8a
1 Parent(s): 0041a8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -1
README.md CHANGED
@@ -6,4 +6,92 @@ language:
6
  pipeline_tag: text-to-image
7
  tags:
8
  - stable diffusion 3
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  pipeline_tag: text-to-image
7
  tags:
8
  - stable diffusion 3
9
+ ---
10
+
11
+ # SD3 ControlNet Inpainting Model Card
12
+
13
+ Finetuned controlnet inpainting model based on sd3-medium.
14
+
15
+ ![SD3](sd3.png)
16
+ ![bucket_alibaba](bucket_alibaba.png)
17
+
18
+ Some advantages of the inpainting model:
19
+ * Due to the SD3 16-channel VAE and 1024 high-res generation ability, the inpainting model maintains better non-inpainting areas (including text).
20
+ * Can generate text by inpainting.
21
+ * Better portrait generation aesthetics.
22
+
23
+ Compared with [SDXL-Inpainting](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1)
24
+
25
+
26
+ # How to Use
27
+
28
+ ``` python
29
+ from diffusers.utils import load_image, check_min_version
30
+ import torch
31
+
32
+ # Local File
33
+ from pipeline_sd3_controlnet_inpainting import StableDiffusion3ControlNetInpaintingPipeline, one_image_and_mask
34
+ from controlnet_sd3 import SD3ControlNetModel
35
+
36
+ check_min_version("0.29.2")
37
+
38
+ # Build model
39
+ controlnet = SD3ControlNetModel.from_pretrained(
40
+ "alimama-creative/SD3-controlnet-inpaint",
41
+ use_safetensors=True,
42
+ )
43
+ pipe = StableDiffusion3ControlNetInpaintingPipeline.from_pretrained(
44
+ "stabilityai/stable-diffusion-3-medium-diffusers",
45
+ controlnet=controlnet,
46
+ torch_dtype=torch.float16,
47
+ )
48
+ pipe.text_encoder.to(torch.float16)
49
+ pipe.controlnet.to(torch.float16)
50
+ pipe.to("cuda")
51
+
52
+ # Load image
53
+ image = load_image(
54
+ "https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/blob/main/prod.png"
55
+ )
56
+ mask = load_image(
57
+ "https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/blob/main/mask.jpeg"
58
+ )
59
+
60
+ # Set args
61
+ width = 1024
62
+ height = 1024
63
+ prompt="a woman wearing a white jacket, black hat and black pants is standing in a field, the hat writes SD3"
64
+ generator = torch.Generator(device="cuda").manual_seed(24)
65
+ input_dict = one_image_and_mask(image, mask, size=(width, height), latent_scale=pipe.vae_scale_factor, invert_mask = True)
66
+
67
+ # Inference
68
+ res_image = pipe(
69
+ negative_prompt='deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, mutated hands and fingers, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation, NSFW',
70
+ prompt=prompt,
71
+ height=height,
72
+ width=width,
73
+ control_image= input_dict['pil_masked_image'], # H, W, C,
74
+ control_mask=input_dict["mask"] > 0.5, # B,1,H,W
75
+ num_inference_steps=28,
76
+ generator=generator,
77
+ controlnet_conditioning_scale=0.95,
78
+ guidance_scale=7,
79
+ ).images[0]
80
+
81
+ res_image.save(f'res.png')
82
+ ```
83
+
84
+
85
+ ## Training Detail
86
+
87
+ The model was trained on 12M laion2B and internal source images for 20k steps at resolution 1024x1024.
88
+
89
+ * Mixed precision : FP16
90
+ * Learning rate : 1e-4
91
+ * Batch size : 192
92
+ * Timestep sampling mode : 'logit_normal'
93
+ * Loss : Flow Matching
94
+
95
+ ## Limitation
96
+
97
+ Due to the fact that only 1024*1024 pixel resolution was used during the training phase, the inference performs best at this size, with other sizes yielding suboptimal results. We will initiate multi-resolution training in the future, and at that time, we will open-source the new weights.