Image Labels: One-shot image-conditioned object detection

by godaspeg - opened Jun 26

Jun 26

Is it possible to detect objects using images as labels instead of texts? As OwlVIT is based on CLIP Embeddings, I think this should be theoretically possible.

godaspeg changed discussion title from Image Labels to Image Labels: One-shot image-conditioned object detection Jun 26

nielsr

Jun 27

Yes, image-guided object detection is supported, see the demo notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/OWLv2/Zero_and_one_shot_object_detection_with_OWLv2.ipynb

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment