0xadamm commited on
Commit
8dc2725
1 Parent(s): ad8634a

initial commit

Browse files
.gitattributes CHANGED
@@ -25,7 +25,6 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
.vscode/settings.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "[python]": {
3
+ "editor.defaultFormatter": "ms-python.black-formatter"
4
+ },
5
+ "python.formatting.provider": "none"
6
+ }
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - stable-diffusion
4
+ - controlnet
5
+ - image-to-image
6
+ license: openrail++
7
+ language:
8
+ - en
9
+ pipeline_tag: image-to-image
10
+ ---
11
+ # QR Code Conditioned ControlNet Models for Stable Diffusion 2.1
12
+
13
+ ![1](https://www.dropbox.com/s/c1kx64v1cpsh2mp/1.png?raw=1)
14
+
15
+ ## Model Description
16
+
17
+ This repo holds the safetensors & diffusers versions of the QR code conditioned ControlNet for Stable Diffusion v2.1.
18
+ The Stable Diffusion 2.1 version is marginally more effective, as it was developed to address my specific needs. However, a 1.5 version model was also trained on the same dataset for those who are using the older version.
19
+
20
+ ## How to use with diffusers
21
+
22
+ ```bash
23
+ pip -q install diffusers transformers accelerate torch xformers
24
+ ```
25
+
26
+ ```python
27
+ import torch
28
+ from PIL import Image
29
+ from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, DDIMScheduler
30
+ from diffusers.utils import load_image
31
+
32
+ controlnet = ControlNetModel.from_pretrained("DionTimmer/controlnet_qrcode-control_v11p_sd21",
33
+ torch_dtype=torch.float16)
34
+
35
+ pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
36
+ "stabilityai/stable-diffusion-2-1",
37
+ controlnet=controlnet,
38
+ safety_checker=None,
39
+ torch_dtype=torch.float16
40
+ )
41
+
42
+ pipe.enable_xformers_memory_efficient_attention()
43
+ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
44
+ pipe.enable_model_cpu_offload()
45
+
46
+ def resize_for_condition_image(input_image: Image, resolution: int):
47
+ input_image = input_image.convert("RGB")
48
+ W, H = input_image.size
49
+ k = float(resolution) / min(H, W)
50
+ H *= k
51
+ W *= k
52
+ H = int(round(H / 64.0)) * 64
53
+ W = int(round(W / 64.0)) * 64
54
+ img = input_image.resize((W, H), resample=Image.LANCZOS)
55
+ return img
56
+
57
+
58
+ # play with guidance_scale, controlnet_conditioning_scale and strength to make a valid QR Code Image
59
+
60
+ # qr code image
61
+ source_image = load_image("https://s3.amazonaws.com/moonup/production/uploads/6064e095abd8d3692e3e2ed6/A_RqHaAM6YHBodPLwqtjn.png")
62
+ # initial image, anything
63
+ init_image = load_image("https://s3.amazonaws.com/moonup/production/uploads/noauth/KfMBABpOwIuNolv1pe3qX.jpeg")
64
+ condition_image = resize_for_condition_image(source_image, 768)
65
+ init_image = resize_for_condition_image(init_image, 768)
66
+ generator = torch.manual_seed(123121231)
67
+ image = pipe(prompt="a bilboard in NYC with a qrcode",
68
+ negative_prompt="ugly, disfigured, low quality, blurry, nsfw",
69
+ image=init_image,
70
+ control_image=condition_image,
71
+ width=768,
72
+ height=768,
73
+ guidance_scale=20,
74
+ controlnet_conditioning_scale=1.5,
75
+ generator=generator,
76
+ strength=0.9,
77
+ num_inference_steps=150,
78
+ )
79
+
80
+ image.images[0]
81
+
82
+ ```
83
+
84
+ ## Performance and Limitations
85
+
86
+ These models perform quite well in most cases, but please note that they are not 100% accurate. In some instances, the QR code shape might not come through as expected. You can increase the ControlNet weight to emphasize the QR code shape. However, be cautious as this might negatively impact the style of your output.**To optimize for scanning, please generate your QR codes with correction mode 'H' (30%).**
87
+
88
+ To balance between style and shape, a gentle fine-tuning of the control weight might be required based on the individual input and the desired output, aswell as the correct prompt. Some prompts do not work until you increase the weight by a lot. The process of finding the right balance between these factors is part art and part science. For the best results, it is recommended to generate your artwork at a resolution of 768. This allows for a higher level of detail in the final product, enhancing the quality and effectiveness of the QR code-based artwork.
89
+
90
+ ## Installation
91
+
92
+ The simplest way to use this is to place the .safetensors model and its .yaml config file in the folder where your other controlnet models are installed, which varies per application.
93
+ For usage in auto1111 they can be placed in the webui/models/ControlNet folder. They can be loaded using the controlnet webui extension which you can install through the extensions tab in the webui (https://github.com/Mikubill/sd-webui-controlnet). Make sure to enable your controlnet unit and set your input image as the QR code. Set the model to either the SD2.1 or 1.5 version depending on your base stable diffusion model, or it will error. No pre-processor is needed, though you can use the invert pre-processor for a different variation of results. 768 is the preferred resolution for generation since it allows for more detail.
94
+ Make sure to look up additional info on how to use controlnet if you get stuck, once you have the webui up and running its really easy to install the controlnet extension aswell.
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.18.0.dev0",
4
+ "act_fn": "silu",
5
+ "attention_head_dim": [
6
+ 5,
7
+ 10,
8
+ 20,
9
+ 20
10
+ ],
11
+ "block_out_channels": [
12
+ 320,
13
+ 640,
14
+ 1280,
15
+ 1280
16
+ ],
17
+ "class_embed_type": null,
18
+ "conditioning_embedding_out_channels": [
19
+ 16,
20
+ 32,
21
+ 96,
22
+ 256
23
+ ],
24
+ "controlnet_conditioning_channel_order": "rgb",
25
+ "cross_attention_dim": 1024,
26
+ "down_block_types": [
27
+ "CrossAttnDownBlock2D",
28
+ "CrossAttnDownBlock2D",
29
+ "CrossAttnDownBlock2D",
30
+ "DownBlock2D"
31
+ ],
32
+ "downsample_padding": 1,
33
+ "flip_sin_to_cos": true,
34
+ "freq_shift": 0,
35
+ "global_pool_conditions": false,
36
+ "in_channels": 4,
37
+ "layers_per_block": 2,
38
+ "mid_block_scale_factor": 1,
39
+ "norm_eps": 1e-05,
40
+ "norm_num_groups": 32,
41
+ "num_class_embeds": null,
42
+ "only_cross_attention": false,
43
+ "projection_class_embeddings_input_dim": null,
44
+ "resnet_time_scale_shift": "default",
45
+ "upcast_attention": null,
46
+ "use_linear_projection": true
47
+ }
control_v11p_sd21_qrcode.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:340ff6922e3f96d6311eb1e467dd63886463068cafedffc56a2d8bc5ff9f5563
3
+ size 1456951266
control_v11p_sd21_qrcode.yaml ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ target: cldm.cldm.ControlLDM
3
+ params:
4
+ linear_start: 0.00085
5
+ linear_end: 0.0120
6
+ num_timesteps_cond: 1
7
+ log_every_t: 200
8
+ timesteps: 1000
9
+ first_stage_key: "jpg"
10
+ cond_stage_key: "txt"
11
+ control_key: "hint"
12
+ image_size: 64
13
+ channels: 4
14
+ cond_stage_trainable: false
15
+ conditioning_key: crossattn
16
+ monitor: val/loss_simple_ema
17
+ scale_factor: 0.18215
18
+ use_ema: False
19
+ only_mid_control: False
20
+
21
+ control_stage_config:
22
+ target: cldm.cldm.ControlNet
23
+ params:
24
+ use_checkpoint: True
25
+ image_size: 32 # unused
26
+ in_channels: 4
27
+ hint_channels: 3
28
+ model_channels: 320
29
+ attention_resolutions: [ 4, 2, 1 ]
30
+ num_res_blocks: 2
31
+ channel_mult: [ 1, 2, 4, 4 ]
32
+ num_head_channels: 64 # need to fix for flash-attn
33
+ use_spatial_transformer: True
34
+ use_linear_in_transformer: True
35
+ transformer_depth: 1
36
+ context_dim: 1024
37
+ legacy: False
38
+
39
+ unet_config:
40
+ target: cldm.cldm.ControlledUnetModel
41
+ params:
42
+ use_checkpoint: True
43
+ image_size: 32 # unused
44
+ in_channels: 4
45
+ out_channels: 4
46
+ model_channels: 320
47
+ attention_resolutions: [ 4, 2, 1 ]
48
+ num_res_blocks: 2
49
+ channel_mult: [ 1, 2, 4, 4 ]
50
+ num_head_channels: 64 # need to fix for flash-attn
51
+ use_spatial_transformer: True
52
+ use_linear_in_transformer: True
53
+ transformer_depth: 1
54
+ context_dim: 1024
55
+ legacy: False
56
+
57
+ first_stage_config:
58
+ target: ldm.models.autoencoder.AutoencoderKL
59
+ params:
60
+ embed_dim: 4
61
+ monitor: val/rec_loss
62
+ ddconfig:
63
+ #attn_type: "vanilla-xformers"
64
+ double_z: true
65
+ z_channels: 4
66
+ resolution: 256
67
+ in_channels: 3
68
+ out_ch: 3
69
+ ch: 128
70
+ ch_mult:
71
+ - 1
72
+ - 2
73
+ - 4
74
+ - 4
75
+ num_res_blocks: 2
76
+ attn_resolutions: []
77
+ dropout: 0.0
78
+ lossconfig:
79
+ target: torch.nn.Identity
80
+
81
+ cond_stage_config:
82
+ target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
83
+ params:
84
+ freeze: True
85
+ layer: "penultimate"
diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0e671b5b6c993b99afa265ea88794b5ee40f56ee3b26fc06f75d21b4d8cdfcb
3
+ size 1457051321
diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b3daef6325b0c8e453a30b98cebc146bece31658ac15c178ce122071f7301c3
3
+ size 728596455
diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a68a2ee6172524be3f6d6758cf83e5e6c8b17af31ab617fdca933139fe0f6ca
3
+ size 728496840
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bf31f4c634ae1118b517a4adfe0fd67ae00eb13e1bb97f77e7192f3a047a82b
3
+ size 1456953560
handler.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from PIL import Image
3
+ from diffusers import (
4
+ StableDiffusionControlNetImg2ImgPipeline,
5
+ ControlNetModel,
6
+ DDIMScheduler,
7
+ )
8
+ from diffusers.utils import load_image
9
+ import openai
10
+ from io import BytesIO
11
+ import base64
12
+ import qrcode
13
+ import random
14
+
15
+ qrcode_data = "https://www.vertxdesigns.com/"
16
+ prompt = "masterpiece, best quality, mecha, no humans, black armor, blue eyes, science fiction, fire, laser canon beam, war, conflict, destroyed city background"
17
+ negative_prompt = "UnrealisticDream, FastNegativeEmbedding"
18
+
19
+
20
+ qr = qrcode.QRCode(
21
+ version=1,
22
+ error_correction=qrcode.constants.ERROR_CORRECT_H,
23
+ box_size=10,
24
+ border=4,
25
+ )
26
+ qr.add_data(qrcode_data)
27
+ qr.make(fit=True)
28
+ img = qr.make_image(fill_color="black", back_color="white")
29
+
30
+ # Resize image
31
+ basewidth = 768
32
+ wpercent = basewidth / float(img.size[0])
33
+ hsize = int((float(img.size[1]) * float(wpercent)))
34
+ qrcode_image = img.resize((basewidth, hsize), Image.LANCZOS)
35
+
36
+ # Display the image
37
+ qrcode_image
38
+ # img.save('qrcode.png')
39
+
40
+
41
+ # Initialize the control net model and pipeline.
42
+ controlnet = ControlNetModel.from_pretrained(
43
+ "DionTimmer/controlnet_qrcode-control_v11p_sd21", torch_dtype=torch.float16
44
+ )
45
+
46
+ pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
47
+ "stabilityai/stable-diffusion-2-1",
48
+ controlnet=controlnet,
49
+ safety_checker=None,
50
+ torch_dtype=torch.float16,
51
+ )
52
+
53
+ # Enable memory efficient attention.
54
+ pipe.enable_xformers_memory_efficient_attention()
55
+
56
+ # Set the scheduler for the pipeline.
57
+ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
58
+
59
+ # Enable CPU offload for the model.
60
+ pipe.enable_model_cpu_offload()
61
+
62
+
63
+ # Resizes input_image to a specified resolution while maintaining the aspect ratio.
64
+ def resize_for_condition_image(input_image: Image, resolution: int):
65
+ input_image = input_image.convert("RGB")
66
+ W, H = input_image.size
67
+ k = float(resolution) / min(H, W)
68
+ H *= k
69
+ W *= k
70
+ H = int(round(H / 64.0)) * 64
71
+ W = int(round(W / 64.0)) * 64
72
+ img = input_image.resize((W, H), resample=Image.LANCZOS)
73
+ return img
74
+
75
+
76
+ def get_random_seed():
77
+ return random.randint(1, 1e8) # random integer between 1 and 1,000,000.
78
+
79
+
80
+ # Generate and store your seed.
81
+ seed = get_random_seed()
82
+
83
+ # Set the seed for the random number generator.
84
+ generator = torch.manual_seed(seed)
85
+
86
+ # Print the seed.
87
+ print(seed)
88
+
89
+
90
+ openai.api_key = "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"
91
+ response = openai.Image.create(prompt=prompt, n=1, size="1024x1024")
92
+ image_url = response.data[0].url
93
+ print(image_url)
94
+
95
+
96
+ init_image = load_image(image_url)
97
+
98
+ # Set the control image to the qrcode image.
99
+ control_image = qrcode_image
100
+
101
+ # Resize the initial image
102
+ init_image = resize_for_condition_image(init_image, 768)
103
+
104
+ # Run the image generation process using the pipeline.
105
+ image = pipe(
106
+ prompt=prompt,
107
+ negative_prompt=negative_prompt,
108
+ image=init_image, # The initial image, set as a QR code image
109
+ control_image=control_image, # QR code image
110
+ width=768,
111
+ height=768,
112
+ guidance_scale=7.5, # The influence of the 'prompt' 0-50
113
+ controlnet_conditioning_scale=1.6, # The influence of the qr code 1-5
114
+ generator=generator, # Random seed for the generation process
115
+ strength=0.99, # Noise added to the QR code 0-1
116
+ num_inference_steps=150, # The number of steps in the image generation process
117
+ )
118
+
119
+
120
+ image.images[0]
121
+
122
+
123
+ pil_image = image.images[0]
124
+ buffered = BytesIO()
125
+ pil_image.save(buffered, format="PNG")
126
+ image_base64 = base64.b64encode(buffered.getvalue()).decode()
127
+ print(f"First 10 characters: {image_base64[:10]}")
128
+ print(f"Length of string: {len(image_base64):,}")