sayakpaul HF staff commited on
Commit
7f1b698
1 Parent(s): 7a251ef
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +22 -19
  3. hug_lab_grid.png +3 -0
.gitattributes CHANGED
@@ -37,3 +37,4 @@ cann-small-couple.png filter=lfs diff=lfs merge=lfs -text
37
  cann-small-hf-ofice.png filter=lfs diff=lfs merge=lfs -text
38
  cann-small-megatron.png filter=lfs diff=lfs merge=lfs -text
39
  cann-small-woman.png filter=lfs diff=lfs merge=lfs -text
 
 
37
  cann-small-hf-ofice.png filter=lfs diff=lfs merge=lfs -text
38
  cann-small-megatron.png filter=lfs diff=lfs merge=lfs -text
39
  cann-small-woman.png filter=lfs diff=lfs merge=lfs -text
40
+ hug_lab_grid.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -10,9 +10,10 @@ tags:
10
  inference: false
11
  ---
12
 
13
- # SDXL-controlnet: Canny
14
 
15
- These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. You can find some example images in the following.
 
16
 
17
  prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
18
  ![images_0)](./cann-small-hf-ofice.png)
@@ -46,19 +47,19 @@ import numpy as np
46
  import cv2
47
 
48
  prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
49
- negative_prompt = 'low quality, bad quality, sketches'
50
 
51
  image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
52
 
53
  controlnet_conditioning_scale = 0.5 # recommended for good generalization
54
 
55
  controlnet = ControlNetModel.from_pretrained(
56
- "diffusers/controlnet-canny-sdxl-1.0",
57
  torch_dtype=torch.float16
58
  )
59
  vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
60
  pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
61
- "diffusers/controlnet-canny-sdxl-1.0-small",
62
  controlnet=controlnet,
63
  vae=vae,
64
  torch_dtype=torch.float16,
@@ -73,33 +74,35 @@ image = Image.fromarray(image)
73
 
74
  images = pipe(
75
  prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
76
- ).images
77
 
78
  images[0].save(f"hug_lab.png")
79
  ```
80
 
81
- ![images_10)](./out_hug_lab_7.png)
82
 
83
  To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
84
 
 
 
85
  ### Training
86
 
87
  Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
 
 
 
 
 
 
 
 
 
88
 
89
  #### Training data
90
- This checkpoint was first trained for 20,000 steps on laion 6a resized to a max minimum dimension of 384.
91
- It was then further trained for 20,000 steps on laion 6a resized to a max minimum dimension of 1024 and
92
- then filtered to contain only minimum 1024 images. We found the further high resolution finetuning was
93
- necessary for image quality.
94
 
95
  #### Compute
96
- one 8xA100 machine
97
-
98
- #### Batch size
99
- Data parallel with a single gpu batch size of 8 for a total batch size of 64.
100
-
101
- #### Hyper Parameters
102
- Constant learning rate of 1e-4 scaled by batch size for total learning rate of 64e-4
103
 
104
  #### Mixed precision
105
- fp16
 
10
  inference: false
11
  ---
12
 
13
+ # Small SDXL-controlnet: Canny
14
 
15
+ These are small controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. This checkpoint is 7x smaller than the original XL controlnet checkpoint.
16
+ You can find some example images in the following.
17
 
18
  prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
19
  ![images_0)](./cann-small-hf-ofice.png)
 
47
  import cv2
48
 
49
  prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
50
+ negative_prompt = "low quality, bad quality, sketches"
51
 
52
  image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
53
 
54
  controlnet_conditioning_scale = 0.5 # recommended for good generalization
55
 
56
  controlnet = ControlNetModel.from_pretrained(
57
+ "diffusers/controlnet-canny-sdxl-1.0-small",
58
  torch_dtype=torch.float16
59
  )
60
  vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
61
  pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
62
+ "stabilityai/stable-diffusion-xl-base-1.0",
63
  controlnet=controlnet,
64
  vae=vae,
65
  torch_dtype=torch.float16,
 
74
 
75
  images = pipe(
76
  prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
77
+ ).images
78
 
79
  images[0].save(f"hug_lab.png")
80
  ```
81
 
82
+ ![hug_lab_grid)](./hug_lab_grid.png)
83
 
84
  To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
85
 
86
+ 🚨 Please note that this checkpoint is experimental and there's a lot of room for improvement. We encourage the community to build on top of it, improve it, and provide us with feedback. 🚨
87
+
88
  ### Training
89
 
90
  Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
91
+ You can refer to [this script](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py) for full discolsure.
92
+
93
+ * This checkpoint does not perform distillation. We just use a smaller ControlNet initialized from the SDXL UNet. We
94
+ encourage the community to try and conduct distillation too. This resource might be of help in [this regard](https://huggingface.co/blog/sd_distillation).
95
+ * To learn more about how the ControlNet was initialized, refer to [this code block](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py#L1020C1-L1042C36).
96
+ * It does not have any attention blocks.
97
+ * The model works pretty good on most conditioning images. But for more complex conditionings, the bigger checkpoints might be better. We are still working on improving the quality of this checkpoint and looking for feedback from the community.
98
+ * We recommend playing around with the `controlnet_conditioning_scale` and `guidance_scale` arguments for potentially better
99
+ image generation quality.
100
 
101
  #### Training data
102
+ The model was trained on 3M images from LAION aesthetic 6 plus subset, with batch size of 256 for 50k steps with constant learning rate of 3e-5.
 
 
 
103
 
104
  #### Compute
105
+ One 8xA100 machine
 
 
 
 
 
 
106
 
107
  #### Mixed precision
108
+ FP16
hug_lab_grid.png ADDED

Git LFS Details

  • SHA256: 0fd1403e21cc4ddd1229361d2878f88858e104875af8f5e91de13c5e3024ecf4
  • Pointer size: 132 Bytes
  • Size of remote file: 2.08 MB