Different images while using same latents

#1
by TheoC - opened

Hi!

When asking to generate two images while using the same (duplicated) latents, I obtain two diffrents images.

latents = torch.randn((1, 4, 128, 128), device='cuda').half().repeat(2, 1, 1, 1)

When digging in the code it looks like I have different noise predicted by the UNet while it has twice the same input.

Is it normal?

Where does this variation come from?

Thank you !!

TheoC changed discussion title from Different image while using same latents to Different images while using same latents
ARC Lab, Tencent PCG org

It is not two images but text and none-text latents for classifier-free guidance.

Thank you for you answer !! I am not sure to understand though. Let me be more specific.

I am generating two images by setting

num_images_per_prompt = 2

in the StableDiffusionXLAdapterPipeline call.

I have a single prompt. I also provide the latents argument for the pipe, which is the same for each image.

Therefore the input of the UNet is basically the same, yet the predicted noise differs.
INPUT (latent_model_input):

torch.Size([4, 4, 128, 128])
tensor([[-0.2959, -0.2959, -0.2959, -0.2959, -0.2959],
        [-0.2959, -0.2959, -0.2959, -0.2959, -0.2959],
        [-0.2959, -0.2959, -0.2959, -0.2959, -0.2959],
        [-0.2959, -0.2959, -0.2959, -0.2959, -0.2959]], device='cuda:0',
       dtype=torch.float16)

OUTPUT (noise_pred):

torch.Size([4, 4, 128, 128])
tensor([[-0.2391, -0.1351, -0.1200, -0.1201, -0.1230],
        [-0.2391, -0.1351, -0.1200, -0.1201, -0.1230],
        [-0.2365, -0.1348, -0.1201, -0.1203, -0.1234],
        [-0.2365, -0.1348, -0.1201, -0.1203, -0.1234]], device='cuda:0',
       dtype=torch.float16)

Is there some source of randomness in the UNet pipeline?

Best,
Théo

ARC Lab, Tencent PCG org

Yeah, the first two images are non-text of the last two images, will be ignored.

Sign up or log in to comment