File size: 2,666 Bytes
d3a9afd
3601ef1
c84585b
 
 
 
 
 
 
 
d3a9afd
c84585b
 
 
 
 
 
d973c1c
 
 
 
c84585b
 
d973c1c
 
382f88e
3601ef1
 
0f56195
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c124ff1
0f56195
3601ef1
c84585b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: other
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
- text-to-image
- diffusers
- controlnet
inference: false
---
    
# SDXL-controlnet: OpenPose (v2)

These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with OpenPose (v2) conditioning. You can find some example images in the following. 

prompt: a ballerina, romantic sunset, 4k photo
![images_0)](./screenshot_ballerina.png)


### Comfy Workflow
![images_0)](./out_ballerina.png)


(Image is from ComfyUI, you can drag and drop in Comfy to use it as workflow)

License: refers to the OpenPose's one.

### Using in 🧨 diffusers

First, install all the libraries:

```bash
pip install -q controlnet_aux transformers accelerate
pip install -q git+https://github.com/huggingface/diffusers
```

Now, we're ready to make Darth Vader dance:

```python
from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from controlnet_aux import OpenposeDetector
from diffusers.utils import load_image


# Compute openpose conditioning image.
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
)
openpose_image = openpose(image)

# Initialize ControlNet pipeline.
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()


# Infer.
prompt = "Darth vader dancing in a desert, high quality"
negative_prompt = "low quality, bad quality"
images = pipe(
    prompt, 
    negative_prompt=negative_prompt,
    num_inference_steps=25,
    num_images_per_prompt=4,
    image=openpose_image.resize((1024, 1024)),
    generator=torch.manual_seed(97),
).images
images[0]
```

Here are some gemerated examples:

![](./darth_vader_grid.png)


### Training

Use of the training script by HF🤗 [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md). 

#### Training data
This checkpoint was first trained for 15,000 steps on laion 6a resized to a max minimum dimension of 768. 

#### Compute
one 1xA100 machine (Thanks a lot HF🤗 to provide the compute!)

#### Batch size
Data parallel with a single gpu batch size of 2 with gradient accumulation 8.

#### Hyper Parameters
Constant learning rate of 8e-5

#### Mixed precision
fp16