sayakpaul HF staff commited on
Commit
0423817
1 Parent(s): 92ed1c0
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ cann-medium-couple.png filter=lfs diff=lfs merge=lfs -text
37
+ cann-medium-hf-ofice.png filter=lfs diff=lfs merge=lfs -text
38
+ cann-medium-megatron.png filter=lfs diff=lfs merge=lfs -text
39
+ cann-medium-woman.png filter=lfs diff=lfs merge=lfs -text
40
+ hug_lab_grid.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail++
3
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
4
+ tags:
5
+ - stable-diffusion-xl
6
+ - stable-diffusion-xl-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - controlnet
10
+ inference: false
11
+ ---
12
+
13
+ # Small SDXL-controlnet: Canny
14
+
15
+ These are small controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. This checkpoint is 7x smaller than the original XL controlnet checkpoint.
16
+ You can find some example images in the following.
17
+
18
+ prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
19
+ ![images_0)](./cann-medium-hf-ofice.png)
20
+
21
+ prompt: a woman, close up, detailed, beautiful, street photography, photorealistic, detailed, Kodak ektar 100, natural, candid shot
22
+ ![images_1)](./cann-medium-woman.png)
23
+
24
+ prompt: megatron in an apocalyptic world ground, runied city in the background, photorealistic
25
+ ![images_2)](./cann-medium-megatron.png)
26
+
27
+ prompt: a couple watching sunset, 4k photo
28
+ ![images_3)](./cann-medium-couple.png)
29
+
30
+
31
+ ## Usage
32
+
33
+ Make sure to first install the libraries:
34
+
35
+ ```bash
36
+ pip install accelerate transformers safetensors opencv-python diffusers
37
+ ```
38
+
39
+ And then we're ready to go:
40
+
41
+ ```python
42
+ from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
43
+ from diffusers.utils import load_image
44
+ from PIL import Image
45
+ import torch
46
+ import numpy as np
47
+ import cv2
48
+
49
+ prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
50
+ negative_prompt = "low quality, bad quality, sketches"
51
+
52
+ image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
53
+
54
+ controlnet_conditioning_scale = 0.5 # recommended for good generalization
55
+
56
+ controlnet = ControlNetModel.from_pretrained(
57
+ "diffusers/controlnet-canny-sdxl-1.0-mid",
58
+ torch_dtype=torch.float16
59
+ )
60
+ vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
61
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
62
+ "stabilityai/stable-diffusion-xl-base-1.0",
63
+ controlnet=controlnet,
64
+ vae=vae,
65
+ torch_dtype=torch.float16,
66
+ )
67
+ pipe.enable_model_cpu_offload()
68
+
69
+ image = np.array(image)
70
+ image = cv2.Canny(image, 100, 200)
71
+ image = image[:, :, None]
72
+ image = np.concatenate([image, image, image], axis=2)
73
+ image = Image.fromarray(image)
74
+
75
+ images = pipe(
76
+ prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
77
+ ).images
78
+
79
+ images[0].save(f"hug_lab.png")
80
+ ```
81
+
82
+ ![hug_lab_grid)](./hug_lab_grid.png)
83
+
84
+ To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
85
+
86
+ 🚨 Please note that this checkpoint is experimental and there's a lot of room for improvement. We encourage the community to build on top of it, improve it, and provide us with feedback. 🚨
87
+
88
+ ### Training
89
+
90
+ Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
91
+ You can refer to [this script](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py) for full discolsure.
92
+
93
+ * This checkpoint does not perform distillation. We just use a smaller ControlNet initialized from the SDXL UNet. We
94
+ encourage the community to try and conduct distillation too. This resource might be of help in [this regard](https://huggingface.co/blog/sd_distillation).
95
+ * To learn more about how the ControlNet was initialized, refer to [this code block](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py#L1020C1-L1042C36).
96
+ * It does not have any attention blocks.
97
+ * The model works pretty good on most conditioning images. But for more complex conditionings, the bigger checkpoints might be better. We are still working on improving the quality of this checkpoint and looking for feedback from the community.
98
+ * We recommend playing around with the `controlnet_conditioning_scale` and `guidance_scale` arguments for potentially better
99
+ image generation quality.
100
+
101
+ #### Training data
102
+ The model was trained on 3M images from LAION aesthetic 6 plus subset, with batch size of 256 for 50k steps with constant learning rate of 3e-5.
103
+
104
+ #### Compute
105
+ One 8xA100 machine
106
+
107
+ #### Mixed precision
108
+ FP16
cann-medium-couple.png ADDED

Git LFS Details

  • SHA256: ca8723fad4a8a25466006f6c3823575688c100335b507875032e2bd535541d21
  • Pointer size: 132 Bytes
  • Size of remote file: 7.36 MB
cann-medium-hf-ofice.png ADDED

Git LFS Details

  • SHA256: 9098a7eb88934e241ea085825ad22ef2bc29f66c2029ed75710cb21d31a87acc
  • Pointer size: 132 Bytes
  • Size of remote file: 7.35 MB
cann-medium-megatron.png ADDED

Git LFS Details

  • SHA256: 9d9cf0d931c7fda4f57f51a8bd7ccd81880e819bf595320a6eeef2cd097ef24d
  • Pointer size: 132 Bytes
  • Size of remote file: 4.34 MB
cann-medium-woman.png ADDED

Git LFS Details

  • SHA256: 07d2608c9e097ec103af1d6f755e5466c859fe084458130936f7177f747049bc
  • Pointer size: 132 Bytes
  • Size of remote file: 7.03 MB
hug_lab_grid.png ADDED

Git LFS Details

  • SHA256: aba1927620761eff5b6a5821a18f1b7732c5eacd72492a5596bb3fcb45368235
  • Pointer size: 132 Bytes
  • Size of remote file: 1.99 MB