control-lora-v3 / README.md
HighCWu's picture
Update README.md
c5d2863 verified
---
base_model: runwayml/stable-diffusion-v1-5
library_name: diffusers
license: creativeml-openrail-m
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- controlnet
- control-lora-v3
- diffusers-training
inference: true
---
<!-- This model card has been generated automatically according to the information the training script had access to. You
should probably proofread and complete it, then remove this comment. -->
# ControlLoRA Version 3 Pretrained Models Collection
This is a collections of control-lora-v3 weights trained on runwayml/stable-diffusion-v1-5 and stabilityai/stable-diffusion-xl-base-1.0 with different types of conditioning.
You can find some example images below.
## Stable Diffusion
### Canny
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/canny1.png" style="height:256px;" />
<img src="./imgs/canny2.png" style="height:256px;" />
<img src="./imgs/canny3.png" style="height:256px;" />
<img src="./imgs/canny4.png" style="height:256px;" />
<img src="./imgs/canny_vermeer.png" style="height:256px;" />
</div>
### OpenPose + Segmentation
This is experimental, and it doesn't work well.
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/pose5_segmentation5.png" style="height:256px;" />
<img src="./imgs/pose6_segmentation6.png" style="height:256px;" />
<img src="./imgs/pose7_segmentation7.png" style="height:256px;" />
<img src="./imgs/pose8_segmentation8.png" style="height:256px;" />
</div>
### Depth
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/depth1.png" style="height:256px;" />
<img src="./imgs/depth2.png" style="height:256px;" />
<img src="./imgs/depth3.png" style="height:256px;" />
<img src="./imgs/depth4.png" style="height:256px;" />
</div>
### Normal map
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/normal1.png" style="height:256px;" />
<img src="./imgs/normal2.png" style="height:256px;" />
<img src="./imgs/normal3.png" style="height:256px;" />
<img src="./imgs/normal4.png" style="height:256px;" />
</div>
### OpenPose
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/pose1.png" style="height:256px;" />
<img src="./imgs/pose2.png" style="height:256px;" />
<img src="./imgs/pose3.png" style="height:256px;" />
<img src="./imgs/pose4.png" style="height:256px;" />
<img src="./imgs/pose5.png" style="height:256px;" />
<img src="./imgs/pose6.png" style="height:256px;" />
<img src="./imgs/pose7.png" style="height:256px;" />
<img src="./imgs/pose8.png" style="height:256px;" />
</div>
### Segmentation
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/segmentation1.png" style="height:256px;" />
<img src="./imgs/segmentation2.png" style="height:256px;" />
<img src="./imgs/segmentation3.png" style="height:256px;" />
<img src="./imgs/segmentation4.png" style="height:256px;" />
<img src="./imgs/segmentation5.png" style="height:256px;" />
<img src="./imgs/segmentation6.png" style="height:256px;" />
<img src="./imgs/segmentation7.png" style="height:256px;" />
<img src="./imgs/segmentation8.png" style="height:256px;" />
</div>
### Tile
<div style="display: flex; flex-wrap: wrap;">
<img src="./imgs/tile1.png" style="height:256px;" />
<img src="./imgs/tile2.png" style="height:256px;" />
<img src="./imgs/tile3.png" style="height:256px;" />
<img src="./imgs/tile4.png" style="height:256px;" />
</div>
## Stable Diffusion XL
### Canny
<div style="display: flex;">
<img src="./imgs/sdxl_canny1.png" style="height:256px;" />
<img src="./imgs/sdxl_canny2.png" style="height:256px;" />
<img src="./imgs/sdxl_canny3.png" style="height:256px;" />
<img src="./imgs/sdxl_canny4.png" style="height:256px;" />
<img src="./imgs/sdxl_canny_vermeer.png" style="height:256px;" />
</div>
## Intended uses & limitations
#### How to use
First clone the [control-lora-v3](https://github.com/HighCWu/control-lora-v3) and `cd` in the directory:
```sh
git clone https://github.com/HighCWu/control-lora-v3
cd control-lora-v3
```
Then run the python code。
For stable diffusion, use:
```py
# !pip install opencv-python transformers accelerate
from diffusers import UniPCMultistepScheduler
from diffusers.utils import load_image
from model import UNet2DConditionModelEx
from pipeline import StableDiffusionControlLoraV3Pipeline
import numpy as np
import torch
import cv2
from PIL import Image
# download an image
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
image = np.array(image)
# get canny image
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
# load stable diffusion v1-5 and control-lora-v3
unet: UNet2DConditionModelEx = UNet2DConditionModelEx.from_pretrained(
"runwayml/stable-diffusion-v1-5", subfolder="unet", torch_dtype=torch.float16
)
unet = unet.add_extra_conditions(["canny"])
pipe = StableDiffusionControlLoraV3Pipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", unet=unet, torch_dtype=torch.float16
)
# load attention processors
# pipe.load_lora_weights("HighCWu/sd-control-lora-v3-canny")
pipe.load_lora_weights("HighCWu/control-lora-v3", subfolder="sd-control-lora-v3-canny-half_skip_attn-rank16-conv_in-rank64")
# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# remove following line if xformers is not installed
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
# generate image
generator = torch.manual_seed(0)
image = pipe(
"futuristic-looking woman", num_inference_steps=20, generator=generator, image=canny_image
).images[0]
image.show()
```
For stable diffusion xl, use:
```py
# !pip install opencv-python transformers accelerate
from diffusers import AutoencoderKL
from diffusers.utils import load_image
from model import UNet2DConditionModelEx
from pipeline_sdxl import StableDiffusionXLControlLoraV3Pipeline
import numpy as np
import torch
import cv2
from PIL import Image
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"
# download an image
image = load_image(
"https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)
# initialize the models and pipeline
unet: UNet2DConditionModelEx = UNet2DConditionModelEx.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.float16
)
unet = unet.add_extra_conditions(["canny"])
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlLoraV3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", unet=unet, vae=vae, torch_dtype=torch.float16
)
# load attention processors
# pipe.load_lora_weights("HighCWu/sdxl-control-lora-v3-canny")
pipe.load_lora_weights("HighCWu/control-lora-v3", subfolder="sdxl-control-lora-v3-canny-half_skip_attn-rank16-conv_in-rank64")
pipe.enable_model_cpu_offload()
# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
# generate image
image = pipe(
prompt, image=canny_image
).images[0]
image.show()
```
#### Limitations and bias
[TODO: provide examples of latent issues and potential remediations]
## Training details
[TODO: describe the data used to train the model]