sayakpaul HF staff commited on
Commit
d7ca456
1 Parent(s): 7a251ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -7
README.md CHANGED
@@ -46,19 +46,19 @@ import numpy as np
46
  import cv2
47
 
48
  prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
49
- negative_prompt = 'low quality, bad quality, sketches'
50
 
51
  image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
52
 
53
  controlnet_conditioning_scale = 0.5 # recommended for good generalization
54
 
55
  controlnet = ControlNetModel.from_pretrained(
56
- "diffusers/controlnet-canny-sdxl-1.0",
57
  torch_dtype=torch.float16
58
  )
59
  vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
60
  pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
61
- "diffusers/controlnet-canny-sdxl-1.0-small",
62
  controlnet=controlnet,
63
  vae=vae,
64
  torch_dtype=torch.float16,
@@ -73,21 +73,25 @@ image = Image.fromarray(image)
73
 
74
  images = pipe(
75
  prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
76
- ).images
77
 
78
  images[0].save(f"hug_lab.png")
79
  ```
80
 
81
- ![images_10)](./out_hug_lab_7.png)
82
 
83
  To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
84
 
 
 
 
85
  ### Training
86
 
87
  Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
 
88
 
89
  #### Training data
90
- This checkpoint was first trained for 20,000 steps on laion 6a resized to a max minimum dimension of 384.
91
  It was then further trained for 20,000 steps on laion 6a resized to a max minimum dimension of 1024 and
92
  then filtered to contain only minimum 1024 images. We found the further high resolution finetuning was
93
  necessary for image quality.
@@ -102,4 +106,13 @@ Data parallel with a single gpu batch size of 8 for a total batch size of 64.
102
  Constant learning rate of 1e-4 scaled by batch size for total learning rate of 64e-4
103
 
104
  #### Mixed precision
105
- fp16
 
 
 
 
 
 
 
 
 
 
46
  import cv2
47
 
48
  prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
49
+ negative_prompt = "low quality, bad quality, sketches"
50
 
51
  image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
52
 
53
  controlnet_conditioning_scale = 0.5 # recommended for good generalization
54
 
55
  controlnet = ControlNetModel.from_pretrained(
56
+ "diffusers/controlnet-canny-sdxl-1.0-small",
57
  torch_dtype=torch.float16
58
  )
59
  vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
60
  pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
61
+ "stabilityai/stable-diffusion-xl-base-1.0",
62
  controlnet=controlnet,
63
  vae=vae,
64
  torch_dtype=torch.float16,
 
73
 
74
  images = pipe(
75
  prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
76
+ ).images
77
 
78
  images[0].save(f"hug_lab.png")
79
  ```
80
 
81
+ ![hug_lab_grid)](./hug_lab_grid.png)
82
 
83
  To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
84
 
85
+ 🚨 Please note that this checkpoint is experimental and should be deeply investigated before being deployed. We encourage the community to build on top
86
+ of it and improve it. 🚨
87
+
88
  ### Training
89
 
90
  Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
91
+ You can refer to [this script](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py) for full discolsure.
92
 
93
  #### Training data
94
+ This checkpoint was first trained for 20,000 steps on LAION 6A resized to a max minimum dimension of 384.
95
  It was then further trained for 20,000 steps on laion 6a resized to a max minimum dimension of 1024 and
96
  then filtered to contain only minimum 1024 images. We found the further high resolution finetuning was
97
  necessary for image quality.
 
106
  Constant learning rate of 1e-4 scaled by batch size for total learning rate of 64e-4
107
 
108
  #### Mixed precision
109
+ fp16
110
+
111
+ #### Additional notes
112
+
113
+ * This checkpoint does not perform distillation. We just use a smaller ControlNet initialized from the SDXL UNet. We
114
+ encourage the community to try and conduct distillation too, where the smaller ControlNet model would be initialized from
115
+ a bigger ControlNet model. This resource might be of help in [this regard](https://huggingface.co/blog/sd_distillation).
116
+ * It does not have any attention blocks.
117
+ * It is better suited for simple conditioning images. For conditionings involving more complex structures, you
118
+ should use the bigger checkpoints.