How to locate all the objects in the image and the input text using Open Vocabulary Object Detection?

by cssqingfeng - opened

When I use the task prompt to process image, I found that the model prefer to output just one object even there are many objects in the image. for example , I passed one image which contains many people, and and people to the model, but the model just output one person, how do I slove it? I read the paper of florence2, the E.4. for the paper shows the result of Open Vocabulary Object Detection, how the achieve the function of Locate giraffe in the image? there are many giraffes.

cssqingfeng changed discussion title from How to locate all the objects in the image using Open Vocabulary Object Detection? to How to locate all the objects in the image and the input text using Open Vocabulary Object Detection?
cssqingfeng changed discussion status to closed

Sign up or log in to comment