Error executing demo

#2
by Wennn - opened

with torch.autocast("cuda"):
controlnet_image = pipeline(
prompt=prompt, image=img, mask_image=mask, control_image=mask, num_images_per_prompt=1, generator=generator, num_inference_steps=20, guess_mode=False, controlnet_conditioning_scale=cond_scale
).images[0]

RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead

Yahoo Inc. org

Hi @Wennn ,

Thanks for catching the issue. Our model uses a modified StableDiffusionControlNetInpaintPipeline, which replaces the text-to-image Unet (with 4 input channels: encoded image) with the inpainting Unet (with 9 input channels: encoded image+mask+encoded masked image) and is now uploaded as pipeline.py. The following pipeline initialization should resolve the issue:

pipeline = DiffusionPipeline.from_pretrained("yahoo-inc/photo-background-generation", custom_pipeline="yahoo-inc/photo-background-generation")

This change is now reflected in the model card as well. Let us know if the problem persists.

@erfan-yahoo
Thank you for your reply, I am now up and running.
But I have a question.
My computer configuration
14700KF + RTX4070 +32G cuda12.1
It takes me 3 minutes to synthesize an 800*800 image, why is it so slow

Yahoo Inc. org

No worries! Our model takes about a second to run on RTX 4090. I suspect you are not utilizing the GPU for some reason (related to torch, CUDA, driver, etc. installations).

erfan-yahoo changed discussion status to closed

Sign up or log in to comment