How to re-use latent state as a new seed?

#24
by xalex - opened

There are scripts for interpolating between different seeds to explore the latent space. Using a slerp function between two seeds is interesting, but I wonder if one can re-seed with the old latent space.

I tried to return cond_latents (without the 1/0.18215 factor) from the diffuse method and then slerp between a normal distribution (like init2) and the returned cont_latents to only add a bit of noise to the current latent state in the hope to find solutions near the previous state. But when I try to slerp or linearly interpolate between a new random seed and the latent space, I don't get usable results.

I especially have problems how to normalize the tensor as using it naively results in diverging values and images getting brighter and brighter. Substracting the mean results in too dim images.

Let's say you want to interpolate between two prompts, with each resulting image "rooted" in the output of the first prompt.

This is what I've found to work for me:

  1. Your noise at each interpolation step will be the output of slerp.
  2. That noise is added at timestep 0 to the encoded "root" image (using scheduler add_noise function)
  3. Start diffusing

That's kind of what the script does. It just interpolates two seeds and then diffuses step 0 to N for the interpolated noise vector.

Suppose I interpolate with a small alpha the noise init2 = init1 * (1.0-alpha) + new_noise * alpha for a second image.
Then it would probably be a waste of time to diffuse init2 for N steps to get an almost identical image as you've got before with init1, when I could diffuse result_latent_vector + alpha * new_noise for only a few steps to find the second image.

Of course it is a good question how much noise you have to add to avoid getting the same result, but to explore how to work with the latent vector I would first need to understand how the latent vector can be reused. As the code uses cond_latents as name for the seed noise, I suppose the seed is the initialization of the latent vector and it should in principle be possible to use the vector as seed for a new diffusion. But in practice it didn't work for me.

The image-to-image script img2img.py essentially does what you're describing, you could adapt it to work with the interpolation script. It takes an image, corrupts it with a lot-or-a-little amount of noise, depending on the strength parameter, and then "resumes" the diffusion process for the corresponding number of steps.

For example, let num_inference_steps=50, strength=0.5:
img2img will:

  1. encode your input image (could be the output of previous interpolation step)
  2. corrupt it with the corresponding amount of noise you'd expect from step strength*num_inference_steps=25
  3. resume diffusion for the remaining 25 steps
    strength=0 means that you're starting from pure noise as usual, strength will give you back the input image.

I had a look into the diffusers StableDiffusionImg2ImgPipeline yesterday but didn't get it to work yet.
For the img2img script in the stable-diffusion repo I first need to get the original code running as I am currently using the diffusers implementation for everything. But maybe I can reuse some parts of the script in the diffusers-based script.

Check out my fork of diffusers - I got image-to-image to work: https://github.com/atarashansky/diffusers

Example usage (be warned, I am not using the safety checker as I found it a little too restrictive):

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id,...)
pipe = pipe.to("cuda")
prompt = "a fantasy landscape, trending on artstation"
init_image = .... # a PIL.Image object.
grid,images,seeds = pipe.make_grid(
    prompt,
    seed=1234,
    height=512,
    width=512,
    num_rows=3,
    num_columns=3,
    num_inference_steps=59,
    guidance_scale=7.5,
    init_image=init_image, # set this to None to do normal text2image
    strength=0.4 # this is the strength parameter.
)

One more disclaimer, this currently only works with the default scheduler.

One more disclaimer, this currently only works with the default scheduler.

Thank you, that was my problem why I didn't get the image_to_image example class to work in an own script.

Sign up or log in to comment