Unable to use it on img2img and rectangle images

#2
by sunhaozhepy - opened

Hi,

I'm recently applying this ControlNet model on an img2img task. I got this error:

Traceback (most recent call last):
  File "/data/jupyter/aging/batch_mlsd_img2img.py", line 32, in <module>
    image = pipeline(prompt, image=img, control_image=condition_image, strength=0.3, negative_prompt=negative_prompt, num_inference_steps=30, guidance_scale=5).images[0]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/jupyter/anaconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/jupyter/anaconda3/lib/python3.11/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_img2img.py", line 1149, in __call__
    down_block_res_samples, mid_block_res_sample = self.controlnet(
                                                   ^^^^^^^^^^^^^^^^
  File "/data/jupyter/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/jupyter/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/jupyter/anaconda3/lib/python3.11/site-packages/diffusers/models/controlnet.py", line 794, in forward
    sample = sample + controlnet_cond
             ~~~~~~~^~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (92) must match the size of tensor b (96) at non-singleton dimension 2

I suspect that this was because my input image was not a square. After I resized the image to 512x512, the issue disappeared, but the image itself was deformed. I checked whether the condition image, which was the output of the LineartDetector, was of the same size of the input image; it wasn't.

I wonder whether we could have a processor that could output the same size as the input image, so that we could avoid this issue and be able to apply img2img+ControlNet on rectangle images?

Sign up or log in to comment