Outpainting III - Inpaint Model

Community Article Published April 23, 2024

This is the third guide about outpainting, if you want to read about the other methods here they are:

In this guide we will explore how to outpaint while preserving the original subject intact. We can achieve this with an inpainting model, even though it was trained for a different task, we can still do this if we help the model understand what we want to generate in the new areas of the image.

1- Original image with a transparent background

For starters, we need a good image and for this, I'll use this one that's from wikimedia.

This car has a lot of text and a recognizable insignia so we can tell whether the image is distorted.

Let's start by removing the background, for this, I'll use RMBG v1.4, you can find the model here: https://huggingface.co/briaai/RMBG-1.4 and the instructions on how to use it, or you can just use the Hugging Face space to do it: https://huggingface.co/spaces/briaai/BRIA-RMBG-1.4.

The idea is to get only the subject with a transparent background (alpha).

If you want the best possible result with this method, it’s better if you manually remove the background with a professional tool like Photoshop. As you can see in this example, the car is not perfect but it will suffice for this guide.

Now that we have the subject, I always prefer to work with square images because SDXL performs better with 1024x1024 images, but technically, this can be done with any image size if your VRAM supports it.

With pillow, it’s as easy as scaling the image and pasting it in a square image, we also need the background to be white:

def scale_and_paste(original_image):
    aspect_ratio = original_image.width / original_image.height

    if original_image.width > original_image.height:
        new_width = 1024
        new_height = round(new_width / aspect_ratio)
    else:
        new_height = 1024
        new_width = round(new_height * aspect_ratio)

    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
    white_background = Image.new("RGBA", (1024, 1024), "white")
    x = (1024 - new_width) // 2
    y = (1024 - new_height) // 2
    white_background.paste(resized_original, (x, y), resized_original)

    return resized_original, white_background

2.- Generate a temporary background

For the next step we need to fill the white area with something similar that we want in the final image, for example in this case, I want the car to be in a highway.

We will use the inpaint controlnet to generate a temporary background with the best results. I used it in the first guide if you want to read how it is done.

controlnet = ControlNetModel.from_pretrained(
    "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
)

The model likes to add details, so it usually adds a spoiler or makes the roof or bumper bigger.

image/png image/png

To mitigate this effect we're going to use a zoe depth controlnet and also make the car a little smaller than the original so we don't have any problem pasting the original back over the image.

from controlnet_aux import ZoeDetector

def scale_and_paste(original_image):
    ...
    # make the subject a little smaller
    new_width = new_width - 20
    new_height = new_height - 20
    ...

# load preprocessor and generate depth map
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)

controlnets = [
    ControlNetModel.from_pretrained(
        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
    ),
    ControlNetModel.from_pretrained("diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16),
]

def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=[inpaint_image, zoe_image],
        guidance_scale=6.5,
        num_inference_steps=25,
        generator=generator,
        controlnet_conditioning_scale=[0.5, 0.8],
        control_guidance_end=[0.9, 0.6],
    ).images[0]

    return image

Now we can generate some backgrounds and choose the one we like.

image/png image/png
image/png image/png

In my case I like the last one, so that's the image we will continue to use in the next steps.

Now that we have our background, we just need to paste the original car over it, also we need to create a mask for it to be outpainted.

Original pasted Mask
image/png image/png

3.- Outpaint

The background removal took part of the headlight as an alpha for some reason, it's important to make sure that the original image has the subject with the alpha channel you want. In this case is not that important because the headlight matches the generation.

So now, we can finally generate the outpainting with an inpainting model, I'll use a merged inpaint model with the RealVisXL model.

pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
    "OzzyGT/RealVisXL_V4.0_inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

image = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    image=image,
    mask_image=mask,
    guidance_scale=10.0,
    strength=0.8,
    num_inference_steps=30,
    generator=generator,
).images[0]
image/png image/png
image/png image/png

I like the last one, but since we're using the whole image for outpainting, the original car was changed a little, to fix this we just need to paste the original one once again.

image/png

4.- Final touch-ups

This image seems decent enough, but if you want to really make something good, you'll need to put some effort into it. Until this step, all could be done programmatically, but to get a really good final result, now its time to inpaint some details and use some other software to apply filters and enhance the colors.

For example, I don’t like that it doesn’t have any shadows below the car, so I’ll paint the shadows to simulate them and pass over it with an image-to-image. As always, I just paste the original image over the generated image.

paint img2img pass final
image/png image/png image/png

This can be tiresome to do with code, so I recommend using a good UI for the final touches, I like to use InvokeAI to do this, and also I recommend watching the video tutorials where you can learn how to add details without the need to do complex drawings, for example: https://www.youtube.com/watch?v=GAlaOlihZ20

I'm not going to fix all the details for this demo, but I'll do a little color correction and make it a bit more professional:

image/jpeg

I hope this helps you understand how to outpaint with Diffusers more. If you have any questions, please don’t hesitate to ask them in the discussions.

This is the full code:

import random

import requests
import torch
from controlnet_aux import ZoeDetector
from PIL import Image, ImageOps

from diffusers import (
    AutoencoderKL,
    ControlNetModel,
    StableDiffusionXLControlNetPipeline,
    StableDiffusionXLInpaintPipeline,
)


def scale_and_paste(original_image):
    aspect_ratio = original_image.width / original_image.height

    if original_image.width > original_image.height:
        new_width = 1024
        new_height = round(new_width / aspect_ratio)
    else:
        new_height = 1024
        new_width = round(new_height * aspect_ratio)

    # make the subject a little smaller
    new_width = new_width - 20
    new_height = new_height - 20

    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
    white_background = Image.new("RGBA", (1024, 1024), "white")
    x = (1024 - new_width) // 2
    y = (1024 - new_height) // 2
    white_background.paste(resized_original, (x, y), resized_original)

    return resized_original, white_background


# load the original image with alpha
original_image = Image.open(
    requests.get(
        "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/outpainting/BMW_i8_Safety_Car_Front.png?download=true",
        stream=True,
    ).raw
).convert("RGBA")
resized_img, white_bg_image = scale_and_paste(original_image)

# load preprocessor and generate depth map
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)

# load controlnets
controlnets = [
    ControlNetModel.from_pretrained(
        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
    ),
    ControlNetModel.from_pretrained("diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16),
]

# vae in case it doesn't come with model
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")

# initial pipeline for temp background
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
).to("cuda")


# function to generate
def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=[inpaint_image, zoe_image],
        guidance_scale=6.5,
        num_inference_steps=25,
        generator=generator,
        controlnet_conditioning_scale=[0.5, 0.8],
        control_guidance_end=[0.9, 0.6],
    ).images[0]

    return image


# initial prompt
prompt = "a car on the highway"
negative_prompt = ""

temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 4138619029)

# paste original subject over temporal background
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
temp_image.paste(resized_img, (x, y), resized_img)

# create a mask for the final outpainting
mask = Image.new("L", temp_image.size)
mask.paste(resized_img.split()[3], (x, y))
mask = ImageOps.invert(mask)
final_mask = mask.point(lambda p: p > 128 and 255)

# clear old pipeline for VRAM savings
pipeline = None
torch.cuda.empty_cache()

# new pipeline with inpaiting model
pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
    "OzzyGT/RealVisXL_V4.0_inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

# Use a blurred mask for better blend
mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)


# function for final outpainting
def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=image,
        mask_image=mask,
        guidance_scale=10.0,
        strength=0.8,
        num_inference_steps=30,
        generator=generator,
    ).images[0]

    return image


# better prompt for final outpainting
prompt = "high quality photo of a car on the highway, shadows, highly detailed"
negative_prompt = ""

# generate the image
final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 3352253467)

# paste original subject over final background
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
final_image.paste(resized_img, (x, y), resized_img)
final_image.save("result.png")