PaliGemma 2 ONNX doesn't support object detection?

#1
by NSTiwari - opened

Hi, thanks for sharing the ONNX weights for PaliGemma 2. While it works well for image captioning, I tried several prompts for object detection using the detect keyword in the prompt.
Eg: detect person was one of the prompts, but the response was null.

Are the converted model weights compatible only with captioning tasks?

ONNX Community org

Hmm, it should work. Could you share the code you are using?

ONNX Community org

Also, can you confirm the original (pytorch) version works correctly for your image/prompt?

@Xenova :Okay, after experimenting with various different prompts, I was able to get the bounding box coordinates. Unlike the original PaliGemma 2 weights where a simple <image>detect person would work, I had to specifically provide this prompt <image>detect bounding box of person to make it work.

Hi @Xenova , is it possible to run this using Vanilla JS by loading Transformers.js via a CDN?
I get the following error:

image.png

import { AutoProcessor, PaliGemmaForConditionalGeneration } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.2.4';

Here's how I'm loading it.

Sign up or log in to comment