sayakpaul HF staff commited on
Commit
e1ed4cb
1 Parent(s): b890ef9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: openrail++
4
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
5
+ tags:
6
+ - stable-diffusion-xl
7
+ - stable-diffusion-xl-diffusers
8
+ - text-to-image
9
+ - diffusers
10
+ - controlnet
11
+ inference: false
12
+ ---
13
+
14
+ # SDXL-controlnet: Depth
15
+
16
+ These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with depth conditioning. This checkpoint is 7x smaller than the original XL controlnet checkpoint. You can find some example images in the following.
17
+
18
+ prompt: spiderman lecture, photorealistic
19
+ ![images_0)](./spiderman_small.png)
20
+
21
+ prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
22
+ ![images_1)](./hf_logo_small.png)
23
+
24
+ prompt: megatron in an apocalyptic world ground, runied city in the background, photorealistic
25
+ ![images_2)](./megatron_small.png)
26
+
27
+ ## Usage
28
+
29
+ Make sure to first install the libraries:
30
+
31
+ ```bash
32
+ pip install accelerate transformers safetensors diffusers
33
+ ```
34
+
35
+ And then we're ready to go:
36
+
37
+ ```python
38
+ import torch
39
+ import numpy as np
40
+ from PIL import Image
41
+
42
+ from transformers import DPTFeatureExtractor, DPTForDepthEstimation
43
+ from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
44
+ from diffusers.utils import load_image
45
+
46
+
47
+ depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to("cuda")
48
+ feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
49
+ controlnet = ControlNetModel.from_pretrained(
50
+ "diffusers/controlnet-depth-sdxl-1.0-small",
51
+ variant="fp16",
52
+ use_safetensors=True,
53
+ torch_dtype=torch.float16,
54
+ ).to("cuda")
55
+ vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
56
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
57
+ "stabilityai/stable-diffusion-xl-base-1.0",
58
+ controlnet=controlnet,
59
+ vae=vae,
60
+ variant="fp16",
61
+ use_safetensors=True,
62
+ torch_dtype=torch.float16,
63
+ ).to("cuda")
64
+ pipe.enable_model_cpu_offload()
65
+
66
+ def get_depth_map(image):
67
+ image = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")
68
+ with torch.no_grad(), torch.autocast("cuda"):
69
+ depth_map = depth_estimator(image).predicted_depth
70
+
71
+ depth_map = torch.nn.functional.interpolate(
72
+ depth_map.unsqueeze(1),
73
+ size=(1024, 1024),
74
+ mode="bicubic",
75
+ align_corners=False,
76
+ )
77
+ depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
78
+ depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
79
+ depth_map = (depth_map - depth_min) / (depth_max - depth_min)
80
+ image = torch.cat([depth_map] * 3, dim=1)
81
+
82
+ image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
83
+ image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
84
+ return image
85
+
86
+
87
+ prompt = "stormtrooper lecture, photorealistic"
88
+ image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png")
89
+ controlnet_conditioning_scale = 0.5 # recommended for good generalization
90
+
91
+ depth_image = get_depth_map(image)
92
+
93
+ images = pipe(
94
+ prompt, image=depth_image, num_inference_steps=30, controlnet_conditioning_scale=controlnet_conditioning_scale,
95
+ ).images
96
+ images[0]
97
+
98
+ images[0].save(f"stormtrooper_grid.png")
99
+ ```
100
+
101
+ ![]("stormtrooper_grid.png")
102
+
103
+ To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
104
+
105
+ 🚨 Please note that this checkpoint is experimental and there's a lot of room for improvement. We encourage the community to build on top of it, improve it, and provide us with feedback. 🚨
106
+
107
+ ### Training
108
+
109
+ Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
110
+ You can refer to [this script](https://github.com/huggingface/diffusers/blob/7b93c2a882d8e12209fbaeffa51ee2b599ab5349/examples/research_projects/controlnet/train_controlnet_webdataset.py) for full discolsure.
111
+
112
+ * This checkpoint does not perform distillation. We just use a smaller ControlNet initialized from the SDXL UNet. We
113
+ encourage the community to try and conduct distillation too. This resource might be of help in [this regard](https://huggingface.co/blog/sd_distillation).
114
+ * To learn more about how the ControlNet was initialized, refer to [this code block](https://github.com/huggingface/diffusers/blob/7b93c2a882d8e12209fbaeffa51ee2b599ab5349/examples/research_projects/controlnet/train_controlnet_webdataset.py#L981C1-L999C36).
115
+ * It does not have any attention blocks.
116
+ * The model works pretty good on most conditioning images. But for more complex conditionings, the bigger checkpoints might be better. We are still working on improving the quality of this checkpoint and looking for feedback from the community.
117
+ * We recommend playing around with the `controlnet_conditioning_scale` and `guidance_scale` arguments for potentially better
118
+ image generation quality.
119
+
120
+ #### Training data
121
+ The model was trained on 3M images from LAION aesthetic 6 plus subset, with batch size of 256 for 50k steps with constant learning rate of 3e-5.
122
+
123
+ #### Compute
124
+ One 8xA100 machine
125
+
126
+ #### Mixed precision
127
+ FP16