File size: 1,869 Bytes
d548f90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# SD3 Controlnet




| raw | control image | output |
|:-------------------------:|:-------------------------:|:-------------------------:|
|<img src="./raw.jpg" width = "400" />  | <img src="./pose.jpg" width = "400" /> |  <img src="./demo_1.jpg" width = "400" /> |


# Install Diffusers-SD3-Controlnet

The current [diffusers](https://github.com/instantX-research/diffusers_sd3_control.git) have not been merged into the official code yet.

```cmd
git clone -b sd3_control https://github.com/instantX-research/diffusers_sd3_control.git
cd diffusers
pip install -e .
```

# Demo
```python
import torch
from diffusers import StableDiffusion3Pipeline
from diffusers.models.controlnet_sd3 import ControlNetSD3Model
from diffusers.utils.torch_utils import randn_tensor
import sys, os
sys.path.append('/path/diffusers/examples/community')
from pipeline_stable_diffusion_3_controlnet import StableDiffusion3CommonPipeline
# load pipeline
base_model = 'stabilityai/stable-diffusion-3-medium-diffusers'
pipe = StableDiffusion3CommonPipeline.from_pretrained(
    base_model, 
    controlnet_list=['InstantX/SD3-Controlnet-Pose']
)
pipe.to('cuda:0', torch.float16)
prompt = 'Anime style illustration of a girl wearing a suit. A moon in sky. In the background we see a big rain approaching. text "InstantX" on image'
n_prompt = 'NSFW, nude, naked, porn, ugly'
# controlnet config
controlnet_conditioning = [
    dict(
        control_index=0,
        control_image=load_image('https://huggingface.co/InstantX/SD3-Controlnet-Pose/resolve/main/pose.jpg'),
        control_weight=0.7,
        control_pooled_projections='zeros'
    )
]
# infer
image = pipe(
    prompt=prompt,
    negative_prompt=n_prompt,
    controlnet_conditioning=controlnet_conditioning,
    num_inference_steps=28,
    guidance_scale=7.0,
    height=1024,
    width=1024,
    latents=latents,
).images[0]
```