Can this grounding model output bbox?

by xxheyu - opened Dec 15, 2023

Dec 15, 2023

with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0]))

It only output the description of the image. How can I get the bounding box?

Thanks in advace!

YunqiuX

Jul 7

I just meet the same issue.

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Jul 10

It is in the output text with [[x1,y1,x2,y2]]

YunqiuX

Jul 10

I tried several prompts, but I find the HF version model cannot output the box coordinates.
Then, I shifted to the SAT version model. I found it can generate the box coordinates.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment