File size: 12,145 Bytes
e682b4a
0aae4be
e682b4a
 
 
 
0ff087c
ad8e498
 
 
 
 
 
0ff087c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad8e498
0ff087c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e682b4a
 
0aae4be
e682b4a
9119c1f
e682b4a
0aae4be
 
e682b4a
0aae4be
 
e682b4a
9119c1f
e682b4a
9119c1f
e682b4a
9119c1f
e682b4a
 
0aae4be
e682b4a
 
 
9119c1f
ad8e498
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9119c1f
 
 
ad8e498
9119c1f
 
 
ad8e498
9119c1f
 
 
ad8e498
9119c1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad8e498
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9119c1f
 
ad8e498
 
 
9119c1f
 
 
 
 
 
 
e682b4a
0ff087c
 
 
9119c1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
import gradio as gr
import torch

from diffusers import LMSDiscreteScheduler
from mixdiff import StableDiffusionCanvasPipeline, Text2ImageRegion

article = """
## Usage

In this demo you can use Mixture of Diffusers to configure a canvas made up of 3 diffusion regions. Play around with the prompts and guidance values in each region! You can also change increment the overlap between regions if seams appear in the image.

In the full version of Mixture of Diffusers you will find further freedom to configure the regions in the canvas. Check the [github repo](https://github.com/albarji/mixture-of-diffusers)!

## Motivation

Current image generation methods, such as Stable Diffusion, struggle to position objects at specific locations. While the content of the generated image (somewhat) reflects the objects present in the prompt, it is difficult to frame the prompt in a way that creates an specific composition. For instance, take a prompt expressing a complex composition such as

> A charming house in the countryside on the left,
> in the center a dirt road in the countryside crossing pastures,
> on the right an old and rusty giant robot lying on a dirt road,
> by jakub rozalski,
> sunset lighting on the left and center, dark sunset lighting on the right
> elegant, highly detailed, smooth, sharp focus, artstation, stunning masterpiece

Out of a sample of 20 Stable Diffusion generations with different seeds, the generated images that align best with the prompt are the following:

<table>
  <tr>
    <td><img src="https://user-images.githubusercontent.com/9654655/195373001-ad23b7c4-f5b1-4e5b-9aa1-294441ed19ed.png" width="300"></td>
    <td><img src="https://user-images.githubusercontent.com/9654655/195373174-8d85dd96-310e-48fa-b112-d9902685f22e.png" width="300"></td>
    <td><img src="https://user-images.githubusercontent.com/9654655/195373200-59eeec1e-e1b8-464d-b72e-e28a9004d269.png" width="300"></td>
  </tr>
</table>

The method proposed here strives to provide a better tool for image composition by using several diffusion processes in parallel, each configured with a specific prompt and settings, and focused on a particular region of the image. You can try it out in the example above! The mixture of diffusion processes is done in a way that harmonizes the generation process, preventing "seam" effects in the generated image.

Using several diffusion processes in parallel has also practical advantages when generating very large images, as the GPU memory requirements are similar to that of generating an image of the size of a single tile.

## Responsible use

The same recommendations as in Stable Diffusion apply, so please check the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4).

More broadly speaking, always bear this in mind: YOU are responsible for the content you create using this tool. Do not fully blame, credit, or place the responsibility on the software.

## Gallery

Here are some relevant illustrations created using this software (and putting quite a few hours into them!).

### Darkness Dawning

![Darkness Dawning](https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/cd1358aa-80d5-4c59-b95b-cdfde5dcc4f5/dfidq8n-6da9a886-9f1c-40ae-8341-d77af9552395.png?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7InBhdGgiOiJcL2ZcL2NkMTM1OGFhLTgwZDUtNGM1OS1iOTViLWNkZmRlNWRjYzRmNVwvZGZpZHE4bi02ZGE5YTg4Ni05ZjFjLTQwYWUtODM0MS1kNzdhZjk1NTIzOTUucG5nIn1dXSwiYXVkIjpbInVybjpzZXJ2aWNlOmZpbGUuZG93bmxvYWQiXX0.ff6XoVBPdUbcTLcuHUpQMPrD2TaXBM_s6HfRhsARDw0)

### Yog-Sothoth

![Yog-Sothoth](https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/cd1358aa-80d5-4c59-b95b-cdfde5dcc4f5/dfidsq4-174dd428-2c5a-48f6-a78f-9441fb3cffea.png?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7InBhdGgiOiJcL2ZcL2NkMTM1OGFhLTgwZDUtNGM1OS1iOTViLWNkZmRlNWRjYzRmNVwvZGZpZHNxNC0xNzRkZDQyOC0yYzVhLTQ4ZjYtYTc4Zi05NDQxZmIzY2ZmZWEucG5nIn1dXSwiYXVkIjpbInVybjpzZXJ2aWNlOmZpbGUuZG93bmxvYWQiXX0.X42zWgsk3lYnYwuEgkifRFRH2km-npHvrdleDN3m6bA)

### Looking through the eyes of giants

![Looking through the eyes of giants](https://user-images.githubusercontent.com/9654655/218307148-95ce88b6-b2a3-458d-b469-daf5bd56e3a7.jpg)

[Follow me on DeviantArt for more!](https://www.deviantart.com/albarji)

## Acknowledgements

First and foremost, my most sincere appreciation for the [Stable Diffusion team](https://stability.ai/blog/stable-diffusion-public-release) for releasing such an awesome model, and for letting me take part of the closed beta. Kudos also to the Hugging Face community and developers for implementing the [Diffusers library](https://github.com/huggingface/diffusers).

Thanks to Hugging Face for providing support and a GPU spaces for running this demo. Thanks also to Instituto de Ingeniería del Conocimiento and Grupo de Aprendizaje Automático (Universidad Autónoma de Madrid) for providing GPU resources for testing and experimenting this library.

Thanks also to the vibrant communities of the Stable Diffusion discord channel and [Lexica](https://lexica.art/), where I have learned about many amazing artists and styles. And to my friend Abril for sharing many tips on cool artists!
"""


# Creater scheduler and model (similar to StableDiffusionPipeline)
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipeline = StableDiffusionCanvasPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler).to("cuda" if torch.cuda.is_available() else "cpu")

def generate(prompt1, prompt2, prompt3, gc1, gc2, gc3, overlap, steps, seed):
    """Mixture of Diffusers generation"""
    tile_width = 640
    tile_height = 640
    return pipeline(
        canvas_height=tile_height,
        canvas_width=tile_width + (tile_width - overlap) * 2,
        regions=[
            Text2ImageRegion(0, tile_height, 0, tile_width, guidance_scale=gc1,
                prompt=prompt1),
            Text2ImageRegion(0, tile_height, tile_width - overlap, tile_width - overlap + tile_width, guidance_scale=gc2,
                prompt=prompt2),
            Text2ImageRegion(0, tile_height, (tile_width - overlap) * 2, (tile_width - overlap) * 2 + tile_width, guidance_scale=gc3,
                prompt=prompt3),
        ],
        num_inference_steps=steps,
        seed=seed,
    )["sample"][0]

with gr.Blocks(title="Mixture of Diffusers") as demo:
    gr.HTML(
        """
            <div style="text-align: center; max-width: 700px; margin: 0 auto;">
              <div
                style="
                  display: inline-flex;
                  align-items: center;
                  gap: 0.8rem;
                  font-size: 1.75rem;
                "
              >
                <h1 style="font-weight: 1000; margin-bottom: 7px; line-height: normal;">
                  Mixture of Diffusers
                </h1>
              </div>
              <p style="margin-bottom: 10px; font-size: 94%">
                <a href="https://arxiv.org/abs/2302.02412">[Paper]</a>  <a href="https://github.com/albarji/mixture-of-diffusers">[Code in Github]</a>  <a href="https://huggingface.co/spaces/albarji/mixture-of-diffusers?duplicate=true">
              </p>
            </div>
        """
    )
    gr.HTML("""
        <p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings.
        <br/>
        <a href="https://huggingface.co/spaces/albarji/mixture-of-diffusers?duplicate=true">
        <img style="margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
        <p/>
    """)
    with gr.Row():
        with gr.Column(scale=1):
            gr.Markdown("### Left region")
            left_prompt = gr.Textbox(lines=2, label="Prompt (what do you want to see in the left side of the image?)")
            left_gs = gr.Slider(minimum=0, maximum=15, value=8, step=1, label="Guidance scale")
        with gr.Column(scale=1):
            gr.Markdown("### Center region")
            center_prompt = gr.Textbox(lines=2, label="Prompt (what do you want to see in the center of the image?)")
            center_gs = gr.Slider(minimum=0, maximum=15, value=8, step=1, label="Guidance scale")
        with gr.Column(scale=1):
            gr.Markdown("### Right region")
            right_prompt = gr.Textbox(lines=2, label="Prompt (what do you want to see in the right side of the image?)")
            right_gs = gr.Slider(minimum=0, maximum=15, value=8, step=1, label="Guidance scale")
    gr.Markdown("### General parameters")
    with gr.Row():
        with gr.Column(scale=1):
            overlap = gr.Slider(minimum=128, maximum=320, value=256, step=8, label="Overlap between diffusion regions")
        with gr.Column(scale=1):
            steps = gr.Slider(minimum=1, maximum=50, value=15, step=1, label="Number of diffusion steps")
        with gr.Column(scale=1):
            seed = gr.Number(value=12345, precision=0, label="Random seed")
    with gr.Row():
        button = gr.Button(value="Generate")
    with gr.Row():
        output = gr.Image(label="Generated image")
    with gr.Row():
        gr.Examples(
            examples=[
                [
                    "A charming house in the countryside, by jakub rozalski, sunset lighting, elegant, highly detailed, smooth, sharp focus, artstation, stunning masterpiece",
                    "A dirt road in the countryside crossing pastures, by jakub rozalski, sunset lighting, elegant, highly detailed, smooth, sharp focus, artstation, stunning masterpiece",
                    "An old and rusty giant robot lying on a dirt road, by jakub rozalski, dark sunset lighting, elegant, highly detailed, smooth, sharp focus, artstation, stunning masterpiece",
                    8, 8, 8,
                    256,
                    50,
                    7178915308
                ],
                [
                    "Abstract decorative illustration, by joan miro and gustav klimt and marlina vera and loish, elegant, intricate, highly detailed, smooth, sharp focus, vibrant colors, artstation, stunning masterpiece",
                    "Abstract decorative illustration, by joan miro and gustav klimt and marlina vera and loish, elegant, intricate, highly detailed, smooth, sharp focus, vibrant colors, artstation, stunning masterpiece",
                    "Abstract decorative illustration, by joan miro and gustav klimt and marlina vera and loish, elegant, intricate, highly detailed, smooth, sharp focus, vibrant colors, artstation, stunning masterpiece",
                    8, 8, 8,
                    256,
                    35,
                    21156517
                ],
                [
                    "Magical diagrams and runes written with chalk on a blackboard, elegant, intricate, highly detailed, smooth, sharp focus, artstation, stunning masterpiece",
                    "Magical diagrams and runes written with chalk on a blackboard, elegant, intricate, highly detailed, smooth, sharp focus, artstation, stunning masterpiece",
                    "Magical diagrams and runes written with chalk on a blackboard, elegant, intricate, highly detailed, smooth, sharp focus, artstation, stunning masterpiece",
                    12, 12, 12,
                    256,
                    35,
                    12591765619
                ]
            ],
            inputs=[left_prompt, center_prompt, right_prompt, left_gs, center_gs, right_gs, overlap, steps, seed],
            outputs=output,
            fn=generate,
            cache_examples=True
        )
    
    button.click(
        generate, 
        inputs=[left_prompt, center_prompt, right_prompt, left_gs, center_gs, right_gs, overlap, steps, seed],
        outputs=output
    )

    with gr.Row():
        gr.Markdown(article)

demo.launch(server_name="0.0.0.0")