Image Labels: One-shot image-conditioned object detection

#5
by godaspeg - opened

Is it possible to detect objects using images as labels instead of texts? As OwlVIT is based on CLIP Embeddings, I think this should be theoretically possible.

godaspeg changed discussion title from Image Labels to Image Labels: One-shot image-conditioned object detection

Sign up or log in to comment