Areas masked as NOT for inpaint are nonetheless altered, why is this?

#14
by spejamas - opened

Take this image I send for inpainting (or in this case, outpainting):
afremov_puppy_to_outpaint.png
Using this mask:
watercolor_mask_fixed.png
This inpainting pipeline, instantiated as demonstrated in the model card pipe = AutoPipelineForInpainting.from_pretrained("diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16").to("cuda"), produces the following:
afremov_puppy_outpainted.jpg
which is a great outpainting. However, I notice that the colors in the output that correspond to the masked area (the area that as I understand is not meant to be altered) are different from the original. The colors aren't as deep, and in some of my tests, small artefacts seem to appear where they didn't exist in the original. You can see the difference comparing the images above; in addition, you can see a difference in the RGB color levels in the area corresponding to the masked region as viewed in a photo editor (original first, outpaint output second):
rgb_before_outpaint.png
rgb_outpainted.png
I've been using guidance scale 7, inference steps 40, and strength of 1. I thought strength=1 might have caused the problem, but I tried it with lower strength as well and I notice the same degradation.

So......why? Why does this happen? Is there some kind of preprocessing of the image that degrades it? Is it possible to avoid degradation of the masked area with this pipeline?

Oh

From the model card:
"When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version."
I'll check again what I was seeing with strength < 1. My mistake. Thank you for the disclaimer.

Next version anytime soon? :)

NEVERMIND I TAKE IT BACK

After some more tests, the degradation from running the pipeline at .99 strength is indistinguishable from the degradation at 1 strength. Lower strengths also noticeably degrade the original picture, to a lesser extent than higher strengths. I'm not sure how to reverse the degradation, so for now this fine tune is at least for me unusable.

What I do is that I finalize the edit of the image using two layers (one for the input image, one for the output image) with a software like Gimp.
I use the eraser tool on the top layer and get what I want thanks to transparency. This way, I keep the sharpness of the input image and benefit from the inpainting in the output image.

Updates with more observations:
The spikes in the second RGB visualization I referenced earlier arose from a problem in my workflow (RGB was converted to P). The true distribution is smoother (but still very different). Here's a better visualization, where red is the original image and blue is the outpainted:
plt_1.png

And here's a visualization of the map of pixel-by-pixel differences:
afremov_difference.jpg

It's worth noting that there is no pixel in the original that has a lower R,G, or B value than its corresponding pixel in the outpainted. The values strictly increase if they don't stay the same. This makes a more washed out image with no deep colors.

The map I posted above seems inverted. Here's a map of the OUTPAINTED RGB values minus the ORIGINAL RGB values:
afremov_difference_invert.jpg

With this, it's easier to see what the pipeline is modifying about the image. These pixels + the original pixels = the outpainted pixels

What I do is that I finalize the edit of the image using two layers (one for the input image, one for the output image) with a software like Gimp.
I use the eraser tool on the top layer and get what I want thanks to transparency. This way, I keep the sharpness of the input image and benefit from the inpainting in the output image.

This is a fine workflow, thank you Wok. My use case is a little more difficult—since I am outpainting, if I simply layer the original image on top of the output, I get a sharp/noticeable border where the color palette changes. Maybe it can be smoothed, but even then I definitely prefer the original colors better. They are deeper and richer. And the outpainted portion is large, not just one smaller piece of the image.

Histograms of pixel differences in R, G, and B channels individually (outpainted r, g, b minus original r, g, b):
difference_red.png
difference_green.png
difference_blue.png
Seems the pipeline either adds < 100 (of these instances typically < 50), or > 200, with no in between.

Sign up or log in to comment