Square crop?

#1
by ceyda - opened

Looking at this collab interactive demo the model doesn't seem to be limited to square inputs.
Would be nice to support arbitrary sizes in this demo as well (let me know if there is a part I can help)
Example:
image.png

image.png

Hi @ceyda , great catch!

The model only accepts a fixed square size input, which is defined in the config file of each model version. That's why we have a post-processing method that scales the predicted boundary boxes using a target image size.

And you are right, the app is not post-processing the output predictions correctly because (1) a square target size is passed in to rescale the predicted boundary boxes and (2) there is a bug in the post_processing method.

Would you like to work fixing the post_processing bug? Or I can fix it shortly.

Cheers,
Alara

@adirik I'm not sure the problem is with the post_processing
It seems to produce correct bbox sizes when we resize images manually before passing them to the processor like in here:
https://huggingface.co/spaces/adirik/OWL-ViT/discussions/2/files

Although processor is supposed to be resizing inputs internally 🤔 as seen here: https://github.com/huggingface/transformers/blob/ab2006e3d6db88654526a4169e65d4bfc52da2e3/src/transformers/models/owlvit/feature_extraction_owlvit.py#L197

So I don't know maybe a bug still somewhere.

Hi @ceyda , you are right! I confirmed that there is a bug in OwlViTFeatureExtractor and input images are not resized correctly, I will open a fix PR for this issue shortly.

The issue is now fixed! The bug was due to defining the target size within OwlViTFeatureExtractor as a single value instead of a tuple (768 instead of (768, 768)), which led the input image to be cropped later in the pipeline.

adirik changed discussion status to closed

ok now that I can test on any image OWL-ViT has been blowing my mind! 🤯🔥

image.png

Sign up or log in to comment