Is it possible to use the refiner separately from the base model?

#16
by Mousey - opened

It is my understanding that the way to use the refiner is to first run the base model pipeline with output_type="latent" and then run the refiner. Before I run the refiner, I would like to do some modifications to the first image. More precisely, I would need the image as a proper image and not as a Tensor. After I make my computations, is there a way to still use the refiner ?

By the sounds of it to want to go base -> image > modifications -> refiner, so you want to run the refiner on an image.
If so to need to encode the image into latents then run the refiner. The results of the refiners are fairly subtle but it can be done

from diffusers import DiffusionPipeline, AutoencoderKL
from diffusers.image_processor import VaeImageProcessor
from PIL import Image

image = Image.open('cat.png').convert('RGB');

image_processor = VaeImageProcessor();
latents = image_processor.preprocess(image)
latents = latents.to(device="cuda")

vae =  AutoencoderKL.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
                                     subfolder="vae",  use_safetensors=True,
                                     ).to("cuda")

with torch.no_grad():
    latents_dist = vae.encode(latents).latent_dist.sample() * vae.config.scaling_factor

refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16, variant="fp16",
    use_safetensors=True,
    add_watermarker=False
).to('cuda')

prompt = "tiger"
image = refiner(prompt=prompt,
               image=latents_dist).images[0]

image.save('e2c.png')

Notes.

  1. Despite running the refiner in fp16 I've run the Vae encode in 32 bit as it doesn't work in 16 bit
    there's a fixed fp16 vae around but I've not tried it and encoding doesn't use a lot of memory anyway.

  2. I actually ran this on a mac, I've just changed the device from 'mps' to 'cuda' and hacked out some torch changes to make fp16 work on MPS.

Before

cat.png

After

e2c.png

Sign up or log in to comment