takuma104
/

controlnet_dev

Model card Files Files and versions Community

takuma104 commited on Feb 25, 2023

Commit

3de264f

•

1 Parent(s): 1eb73c8

add gen_compare

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

gen_compare/.gitattributes +8 -0
gen_compare/README.md +55 -0
gen_compare/control_images/bird_512x512.png +0 -0
gen_compare/control_images/converted/control_bird_canny.png +0 -0
gen_compare/control_images/converted/control_bird_depth.png +0 -0
gen_compare/control_images/converted/control_bird_hed.png +0 -0
gen_compare/control_images/converted/control_bird_mlsd.png +0 -0
gen_compare/control_images/converted/control_bird_normal.png +0 -0
gen_compare/control_images/converted/control_bird_openpose.png +0 -0
gen_compare/control_images/converted/control_bird_scribble.png +0 -0
gen_compare/control_images/converted/control_bird_seg.png +0 -0
gen_compare/control_images/converted/control_human_canny.png +0 -0
gen_compare/control_images/converted/control_human_depth.png +0 -0
gen_compare/control_images/converted/control_human_hed.png +0 -0
gen_compare/control_images/converted/control_human_mlsd.png +0 -0
gen_compare/control_images/converted/control_human_normal.png +0 -0
gen_compare/control_images/converted/control_human_openpose.png +0 -0
gen_compare/control_images/converted/control_human_scribble.png +0 -0
gen_compare/control_images/converted/control_human_seg.png +0 -0
gen_compare/control_images/converted/control_room_canny.png +0 -0
gen_compare/control_images/converted/control_room_depth.png +0 -0
gen_compare/control_images/converted/control_room_hed.png +0 -0
gen_compare/control_images/converted/control_room_mlsd.png +0 -0
gen_compare/control_images/converted/control_room_normal.png +0 -0
gen_compare/control_images/converted/control_room_openpose.png +0 -0
gen_compare/control_images/converted/control_room_scribble.png +0 -0
gen_compare/control_images/converted/control_room_seg.png +0 -0
gen_compare/control_images/converted/control_vermeer_canny.png +0 -0
gen_compare/control_images/converted/control_vermeer_depth.png +0 -0
gen_compare/control_images/converted/control_vermeer_hed.png +0 -0
gen_compare/control_images/converted/control_vermeer_mlsd.png +0 -0
gen_compare/control_images/converted/control_vermeer_normal.png +0 -0
gen_compare/control_images/converted/control_vermeer_openpose.png +0 -0
gen_compare/control_images/converted/control_vermeer_scribble.png +0 -0
gen_compare/control_images/converted/control_vermeer_seg.png +0 -0
gen_compare/control_images/human_512x512.png +0 -0
gen_compare/control_images/room_512x512.png +0 -0
gen_compare/control_images/vermeer_512x512.png +0 -0
gen_compare/create_control_images.py +29 -0
gen_compare/create_plots.py +50 -0
gen_compare/gen_diffusers_image.py +49 -0
gen_compare/gen_diffusers_image.sh +9 -0
gen_compare/gen_reference_image.py +60 -0
gen_compare/gen_reference_image.sh +9 -0
gen_compare/plots/figure_canny.png +3 -0
gen_compare/plots/figure_depth.png +3 -0
gen_compare/plots/figure_hed.png +3 -0
gen_compare/plots/figure_mlsd.png +3 -0
gen_compare/plots/figure_normal.png +3 -0
gen_compare/plots/figure_openpose.png +3 -0

gen_compare/.gitattributes ADDED Viewed

	@@ -0,0 +1,8 @@

+plots/figure_seg.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_canny.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_depth.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_hed.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_mlsd.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_normal.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_openpose.png filter=lfs diff=lfs merge=lfs -text
+plots/figure_scribble.png filter=lfs diff=lfs merge=lfs -text

gen_compare/README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# Diffusers ControlNet Impl. & Reference Impl. generated image comparison
+## Implementation Source Code & versions
+- Diffusers (in development version): https://github.com/takuma104/diffusers/tree/e758682c00a7d23e271fd8c9fb7a48912838045c
+- Reference Impl.: https://github.com/lllyasviel/ControlNet/tree/2f77609bf6a8a2243d9faa198365fc6222c5435f
+## Environment
+- OS: Ubuntu 22.04
+- GPU: Nvidia RTX3060 (12GB)
+## Scripts to generate plots:
+- Create control image: [create_control_images.py](create_control_images.py)
+- Diffusers generated image: [gen_diffusers_image.py](gen_diffusers_image.py)
+- Reference generated image: [gen_reference_image.py](gen_reference_image.py)
+- Create Plots: [create_plots.py](create_plots.py)
+## Original image for control image:
+- All images from [test_imgs](https://github.com/lllyasviel/ControlNet/test_imgs) excepts vermeer image. Croped and resized to 512x512px.
+<img width="128" src="control_images/bird_512x512.png">
+<img width="128" src="control_images/human_512x512.png">
+<img width="128" src="control_images/room_512x512.png">
+<img width="128" src="control_images/vermeer_512x512.png">
+## Generate Settings:
+#### Control Images:
+Converted above original images by [controlnet_hinter](https://github.com/takuma104/controlnet_hinter).
+#### Prompts:
+All images were generated with the same prompt.
+- Prompt: `best quality, extremely detailed, illustration, looking at viewer`
+- Negative Prompt: `monochrome, lowres, bad anatomy, worst quality, low quality`
+#### Ohter setting (both common):
+- sampler: DDIM
+- guidance_scale: 9.0
+- num_inference_steps: 20
+- initial random latents: created on CPU using seed
+## Results:
+[![canny](plots/figure_canny.png)](plots/figure_canny.png)
+[![depth](plots/figure_depth.png)](plots/figure_depth.png)
+[![hed](plots/figure_hed.png)](plots/figure_hed.png)
+[![mlsd](plots/figure_mlsd.png)](plots/figure_mlsd.png)
+[![normal](plots/figure_normal.png)](plots/figure_normal.png)
+[![openpose](plots/figure_openpose.png)](plots/figure_openpose.png)
+[![scribble](plots/figure_scribble.png)](plots/figure_scribble.png)
+[![seg](plots/figure_seg.png)](plots/figure_seg.png)

gen_compare/control_images/bird_512x512.png ADDED Viewed

gen_compare/control_images/converted/control_bird_canny.png ADDED Viewed

gen_compare/control_images/converted/control_bird_depth.png ADDED Viewed

gen_compare/control_images/converted/control_bird_hed.png ADDED Viewed

gen_compare/control_images/converted/control_bird_mlsd.png ADDED Viewed

gen_compare/control_images/converted/control_bird_normal.png ADDED Viewed

gen_compare/control_images/converted/control_bird_openpose.png ADDED Viewed

gen_compare/control_images/converted/control_bird_scribble.png ADDED Viewed

gen_compare/control_images/converted/control_bird_seg.png ADDED Viewed

gen_compare/control_images/converted/control_human_canny.png ADDED Viewed

gen_compare/control_images/converted/control_human_depth.png ADDED Viewed

gen_compare/control_images/converted/control_human_hed.png ADDED Viewed

gen_compare/control_images/converted/control_human_mlsd.png ADDED Viewed

gen_compare/control_images/converted/control_human_normal.png ADDED Viewed

gen_compare/control_images/converted/control_human_openpose.png ADDED Viewed

gen_compare/control_images/converted/control_human_scribble.png ADDED Viewed

gen_compare/control_images/converted/control_human_seg.png ADDED Viewed

gen_compare/control_images/converted/control_room_canny.png ADDED Viewed

gen_compare/control_images/converted/control_room_depth.png ADDED Viewed

gen_compare/control_images/converted/control_room_hed.png ADDED Viewed

gen_compare/control_images/converted/control_room_mlsd.png ADDED Viewed

gen_compare/control_images/converted/control_room_normal.png ADDED Viewed

gen_compare/control_images/converted/control_room_openpose.png ADDED Viewed

gen_compare/control_images/converted/control_room_scribble.png ADDED Viewed

gen_compare/control_images/converted/control_room_seg.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_canny.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_depth.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_hed.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_mlsd.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_normal.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_openpose.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_scribble.png ADDED Viewed

gen_compare/control_images/converted/control_vermeer_seg.png ADDED Viewed

gen_compare/control_images/human_512x512.png ADDED Viewed

gen_compare/control_images/room_512x512.png ADDED Viewed

gen_compare/control_images/vermeer_512x512.png ADDED Viewed

gen_compare/create_control_images.py ADDED Viewed

	@@ -0,0 +1,29 @@

+from PIL import Image
+import controlnet_hinter
+#model_suffixes = {"canny","normal","depth","openpose","hed","scribble","mlsd","seg"}
+def write_converted_files(original_image, prefix):
+    controlnet_hinter.hint_canny(original_image).save(prefix + '_canny.png')
+    controlnet_hinter.hint_depth(original_image).save(prefix + '_depth.png')
+    controlnet_hinter.hint_fake_scribble(original_image).save(
+        prefix + '_scribble.png')
+    controlnet_hinter.hint_hed(original_image).save(prefix + '_hed.png')
+    controlnet_hinter.hint_hough(original_image).save(prefix + '_mlsd.png')
+    controlnet_hinter.hint_normal(original_image).save(prefix + '_normal.png')
+    controlnet_hinter.hint_openpose(
+        original_image).save(prefix + '_openpose.png')
+    # controlnet_hinter.hint_scribble(
+    #     original_image).save(prefix + '_scribble.png')
+    controlnet_hinter.hint_segmentation(
+        original_image).save(prefix + '_seg.png')
+if __name__ == '__main__':
+    image_types = {'bird', 'human', 'room', 'vermeer'}
+    for itype in image_types:
+        image = Image.open(f"control_images/{itype}_512x512.png")
+        write_converted_files(
+            image, prefix=f'control_images/converted/control_{itype}')

gen_compare/create_plots.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import matplotlib.pyplot as plt
+from PIL import Image
+plt.rcParams["figure.figsize"] = (10, 5)
+plt.rcParams['figure.facecolor'] = 'white'
+def render_figure(model_name, fn):
+    image_types = ['bird', 'human', 'room', 'vermeer']
+    def plot_row(axs, control_fn_prefix, output_fn_prefix, name, show_control=False):
+        for i, ax in enumerate(axs):
+            if i == 0:
+                if show_control:
+                    ax.set_title(f'Control')
+                    ax.imshow(Image.open(f'{control_fn_prefix}.png'))
+            else:
+                ax.set_title(f'Seed={i-1} ({name})')
+                ax.imshow(Image.open(f'{output_fn_prefix}_{i-1}.png'))
+    fig, axs = plt.subplots(
+        2 * len(image_types), 5, layout="constrained", figsize=(10, 5 * len(image_types)))
+    for ax in axs.flatten():
+        ax.set_aspect('equal', 'box')
+        ax.axis('off')
+    pair_axs = [list(pair) for pair in zip(axs[::2], axs[1::2])]
+    for image_type, pair_ax in zip(image_types, pair_axs):
+        plot_row(pair_ax[0],
+                 f'./control_images/converted/control_{image_type}_{model_name}',
+                 f'./output_images/diffusers/output_{image_type}_{model_name}',
+                 'Diffusers', show_control=True)
+        plot_row(pair_ax[1],
+                 f'./control_images/converted/control_{image_type}_{model_name}',
+                 f'./output_images/ref/output_{image_type}_{model_name}',
+                 'ref impl.')
+    fig.suptitle(f'Model: {model_name}', fontsize=16)
+    # fig.tight_layout()
+    fig.savefig(fn, dpi=144)
+if __name__ == '__main__':
+    model_names = ["canny", "normal", "depth",
+                   "openpose", "hed", "scribble", "mlsd", "seg"]
+    for model in model_names:
+        fn = f"plots/figure_{model}.png"
+        render_figure(model, fn)
+        print(fn)

gen_compare/gen_diffusers_image.py ADDED Viewed

	@@ -0,0 +1,49 @@

+# Diffusers' ControlNet Implementation Subjective Evaluation
+# https://github.com/takuma104/diffusers/tree/controlnet
+import einops
+import numpy as np
+import torch
+import sys
+from diffusers import StableDiffusionControlNetPipeline
+from PIL import Image
+test_prompt = "best quality, extremely detailed, illustration, looking at viewer"
+test_negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
+def generate_image(seed, control):
+    latent = torch.randn((1,4,64,64), device="cpu", generator=torch.Generator(device="cpu").manual_seed(seed)).cuda()
+    image = pipe(
+        prompt=test_prompt,
+        negative_prompt=test_negative_prompt,
+        guidance_scale=9.0,
+        num_inference_steps=20,
+        latents=latent,
+        #generator=torch.Generator(device="cuda").manual_seed(seed),
+        image=control,
+    ).images[0]
+    return image
+if __name__ == '__main__':
+    model_name = sys.argv[1]
+    control_image_folder = '../huggingface/controlnet_dev/gen_compare/control_images/converted/'
+    output_image_folder = '../huggingface/controlnet_dev/gen_compare/output_images/diffusers/'
+    model_id = f'../huggingface/control_sd15_{model_name}'
+    pipe = StableDiffusionControlNetPipeline.from_pretrained(model_id).to("cuda")
+    pipe.enable_attention_slicing(1)
+    image_types = {'bird', 'human', 'room', 'vermeer'}
+    for image_type in image_types:
+        control_image = Image.open(f'{control_image_folder}control_{image_type}_{model_name}.png')
+        control = np.array(control_image)[:,:,::-1].copy()
+        control = torch.from_numpy(control).float().cuda() / 255.0
+        control = torch.stack([control for _ in range(1)], dim=0)
+        control = einops.rearrange(control, 'b h w c -> b c h w').clone()
+        for seed in range(4):
+            image = generate_image(seed=seed, control=control)
+            image.save(f'{output_image_folder}output_{image_type}_{model_name}_{seed}.png')

gen_compare/gen_diffusers_image.sh ADDED Viewed

	@@ -0,0 +1,9 @@

+#!/bin/bash
+models=("canny" "normal" "depth" "openpose" "hed" "scribble" "mlsd" "seg")
+for model in "${models[@]}"
+do
+    echo $model
+    python gen_diffusers_image.py $model
+done

gen_compare/gen_reference_image.py ADDED Viewed

	@@ -0,0 +1,60 @@

+# from https://github.com/lllyasviel/ControlNet/blob/main/gradio_canny2image.py
+from share import *
+import einops
+import numpy as np
+import torch
+from PIL import Image
+import sys
+from pytorch_lightning import seed_everything
+from cldm.model import create_model, load_state_dict
+from ldm.models.diffusion.ddim import DDIMSampler
+from diffusers.utils import load_image
+test_prompt = "best quality, extremely detailed, illustration, looking at viewer"
+test_negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
+@torch.no_grad()
+def generate(prompt, n_prompt, seed, control, ddim_steps=20, eta=0.0, scale=9.0, H=512, W=512):
+    seed_everything(seed)
+    cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt] * num_samples)]}
+    un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
+    shape = (4, H // 8, W // 8)
+    latent = torch.randn((1,) + shape, device="cpu", generator=torch.Generator(device="cpu").manual_seed(seed)).cuda()
+    samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
+                                                    shape, cond, x_T=latent,
+                                                    verbose=False, eta=eta,
+                                                    unconditional_guidance_scale=scale,
+                                                    unconditional_conditioning=un_cond)
+    x_samples = model.decode_first_stage(samples)
+    x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
+    return Image.fromarray(x_samples[0])
+if __name__ == '__main__':
+    model_name = sys.argv[1]
+    control_image_folder = '../huggingface/controlnet_dev/gen_compare/control_images/converted/'
+    output_image_folder = '../huggingface/controlnet_dev/gen_compare/output_images/ref/'
+    num_samples = 1
+    model = create_model('./models/cldm_v15.yaml').cpu()
+    model.load_state_dict(load_state_dict(f'../huggingface/ControlNet/models/control_sd15_{model_name}.pth', location='cpu'))
+    model = model.cuda()
+    ddim_sampler = DDIMSampler(model)
+    image_types = {'bird', 'human', 'room', 'vermeer'}
+    for image_type in image_types:
+        control_image = Image.open(f'{control_image_folder}control_{image_type}_{model_name}.png')
+        control = np.array(control_image)[:,:,::-1].copy()
+        control = torch.from_numpy(control).float().cuda() / 255.0
+        control = torch.stack([control for _ in range(num_samples)], dim=0)
+        control = einops.rearrange(control, 'b h w c -> b c h w').clone()
+        for seed in range(4):
+            image = generate(test_prompt, test_negative_prompt, seed=seed, control=control)
+            image.save(f'{output_image_folder}output_{image_type}_{model_name}_{seed}.png')

gen_compare/gen_reference_image.sh ADDED Viewed

	@@ -0,0 +1,9 @@

+#!/bin/bash
+models=("canny" "normal" "depth" "openpose" "hed" "scribble" "mlsd" "seg")
+for model in "${models[@]}"
+do
+    echo $model
+    python gen_reference_image.py $model
+done