Spaces:

adirik
/

OWL-ViT

Running

Square crop?

by ceyda - opened Aug 5, 2022

Aug 5, 2022

Looking at this collab interactive demo the model doesn't seem to be limited to square inputs.
Would be nice to support arbitrary sizes in this demo as well (let me know if there is a part I can help)
Example:

adirik

Owner Aug 9, 2022

Hi @ceyda , great catch!

The model only accepts a fixed square size input, which is defined in the config file of each model version. That's why we have a post-processing method that scales the predicted boundary boxes using a target image size.

And you are right, the app is not post-processing the output predictions correctly because (1) a square target size is passed in to rescale the predicted boundary boxes and (2) there is a bug in the post_processing method.

Would you like to work fixing the post_processing bug? Or I can fix it shortly.

Cheers,
Alara

ceyda

Aug 9, 2022

@adirik I'm not sure the problem is with the post_processing
It seems to produce correct bbox sizes when we resize images manually before passing them to the processor like in here:
https://huggingface.co/spaces/adirik/OWL-ViT/discussions/2/files

Although processor is supposed to be resizing inputs internally 🤔 as seen here: https://github.com/huggingface/transformers/blob/ab2006e3d6db88654526a4169e65d4bfc52da2e3/src/transformers/models/owlvit/feature_extraction_owlvit.py#L197

So I don't know maybe a bug still somewhere.

adirik

Owner Aug 11, 2022

Hi @ceyda , you are right! I confirmed that there is a bug in OwlViTFeatureExtractor and input images are not resized correctly, I will open a fix PR for this issue shortly.

adirik

Owner Aug 11, 2022

The issue is now fixed! The bug was due to defining the target size within OwlViTFeatureExtractor as a single value instead of a tuple (768 instead of (768, 768)), which led the input image to be cropped later in the pipeline.

adirik changed discussion status to closed Aug 11, 2022

ceyda

Aug 12, 2022

ok now that I can test on any image OWL-ViT has been blowing my mind! 🤯🔥

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment