File size: 7,708 Bytes

---
base_model: runwayml/stable-diffusion-v1-5
library_name: diffusers
license: creativeml-openrail-m
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- controlnet
- control-lora-v3
- diffusers-training
inference: true
---

<!-- This model card has been generated automatically according to the information the training script had access to. You
should probably proofread and complete it, then remove this comment. -->


# ControlLoRA Version 3 Pretrained Models Collection

This is a collections of control-lora-v3 weights trained on runwayml/stable-diffusion-v1-5 and stabilityai/stable-diffusion-xl-base-1.0 with different types of conditioning.
You can find some example images below.

## Stable Diffusion

### Canny
<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/canny1.png" style="height:256px;" />
  <img src="./imgs/canny2.png" style="height:256px;" />
  <img src="./imgs/canny3.png" style="height:256px;" />
  <img src="./imgs/canny4.png" style="height:256px;" />
  <img src="./imgs/canny_vermeer.png" style="height:256px;" />
</div>

### OpenPose + Segmentation

This is experimental, and it doesn't work well.

<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/pose5_segmentation5.png" style="height:256px;" />
  <img src="./imgs/pose6_segmentation6.png" style="height:256px;" />
  <img src="./imgs/pose7_segmentation7.png" style="height:256px;" />
  <img src="./imgs/pose8_segmentation8.png" style="height:256px;" />
</div>

### Depth
<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/depth1.png" style="height:256px;" />
  <img src="./imgs/depth2.png" style="height:256px;" />
  <img src="./imgs/depth3.png" style="height:256px;" />
  <img src="./imgs/depth4.png" style="height:256px;" />
</div>

### Normal map
<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/normal1.png" style="height:256px;" />
  <img src="./imgs/normal2.png" style="height:256px;" />
  <img src="./imgs/normal3.png" style="height:256px;" />
  <img src="./imgs/normal4.png" style="height:256px;" />
</div>

### OpenPose
<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/pose1.png" style="height:256px;" />
  <img src="./imgs/pose2.png" style="height:256px;" />
  <img src="./imgs/pose3.png" style="height:256px;" />
  <img src="./imgs/pose4.png" style="height:256px;" />
  <img src="./imgs/pose5.png" style="height:256px;" />
  <img src="./imgs/pose6.png" style="height:256px;" />
  <img src="./imgs/pose7.png" style="height:256px;" />
  <img src="./imgs/pose8.png" style="height:256px;" />
</div>

### Segmentation

<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/segmentation1.png" style="height:256px;" />
  <img src="./imgs/segmentation2.png" style="height:256px;" />
  <img src="./imgs/segmentation3.png" style="height:256px;" />
  <img src="./imgs/segmentation4.png" style="height:256px;" />
  <img src="./imgs/segmentation5.png" style="height:256px;" />
  <img src="./imgs/segmentation6.png" style="height:256px;" />
  <img src="./imgs/segmentation7.png" style="height:256px;" />
  <img src="./imgs/segmentation8.png" style="height:256px;" />
</div>

### Tile
<div style="display: flex; flex-wrap: wrap;">
  <img src="./imgs/tile1.png" style="height:256px;" />
  <img src="./imgs/tile2.png" style="height:256px;" />
  <img src="./imgs/tile3.png" style="height:256px;" />
  <img src="./imgs/tile4.png" style="height:256px;" />
</div>

## Stable Diffusion XL

### Canny
<div style="display: flex;">
  <img src="./imgs/sdxl_canny1.png" style="height:256px;" />
  <img src="./imgs/sdxl_canny2.png" style="height:256px;" />
  <img src="./imgs/sdxl_canny3.png" style="height:256px;" />
  <img src="./imgs/sdxl_canny4.png" style="height:256px;" />
  <img src="./imgs/sdxl_canny_vermeer.png" style="height:256px;" />
</div>

## Intended uses & limitations

#### How to use

First clone the [control-lora-v3](https://github.com/HighCWu/control-lora-v3) and `cd` in the directory:
```sh
git clone https://github.com/HighCWu/control-lora-v3
cd control-lora-v3
```

Then run the python code。

For stable diffusion, use:

```py
# !pip install opencv-python transformers accelerate
from diffusers import UniPCMultistepScheduler
from diffusers.utils import load_image
from model import UNet2DConditionModelEx
from pipeline import StableDiffusionControlLoraV3Pipeline
import numpy as np
import torch

import cv2
from PIL import Image

# download an image
image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
image = np.array(image)

# get canny image
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# load stable diffusion v1-5 and control-lora-v3 
unet: UNet2DConditionModelEx = UNet2DConditionModelEx.from_pretrained(
    "runwayml/stable-diffusion-v1-5", subfolder="unet", torch_dtype=torch.float16
)
unet = unet.add_extra_conditions(["canny"])
pipe = StableDiffusionControlLoraV3Pipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", unet=unet, torch_dtype=torch.float16
)
# load attention processors
# pipe.load_lora_weights("HighCWu/sd-control-lora-v3-canny")
pipe.load_lora_weights("HighCWu/control-lora-v3", subfolder="sd-control-lora-v3-canny-half_skip_attn-rank16-conv_in-rank64")

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# remove following line if xformers is not installed
pipe.enable_xformers_memory_efficient_attention()

pipe.enable_model_cpu_offload()

# generate image
generator = torch.manual_seed(0)
image = pipe(
    "futuristic-looking woman", num_inference_steps=20, generator=generator, image=canny_image
).images[0]
image.show()
```

For stable diffusion xl, use:

```py
# !pip install opencv-python transformers accelerate
from diffusers import AutoencoderKL
from diffusers.utils import load_image
from model import UNet2DConditionModelEx
from pipeline_sdxl import StableDiffusionXLControlLoraV3Pipeline
import numpy as np
import torch

import cv2
from PIL import Image

prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"

# download an image
image = load_image(
    "https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)

# initialize the models and pipeline
unet: UNet2DConditionModelEx = UNet2DConditionModelEx.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.float16
)
unet = unet.add_extra_conditions(["canny"])
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlLoraV3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, vae=vae, torch_dtype=torch.float16
)
# load attention processors
# pipe.load_lora_weights("HighCWu/sdxl-control-lora-v3-canny")
pipe.load_lora_weights("HighCWu/control-lora-v3", subfolder="sdxl-control-lora-v3-canny-half_skip_attn-rank16-conv_in-rank64")
pipe.enable_model_cpu_offload()

# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# generate image
image = pipe(
    prompt, image=canny_image
).images[0]
image.show()
```

#### Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

## Training details

[TODO: describe the data used to train the model]