the model can't stop generating for content extraction

#8
by tankstarwar - opened

I asked the model to extract tabular content from a 3x5 area of a spreadsheet image, but the model seems never stopped and went OOM eventually, is it an issue?

Microsoft org

@tankstarwar We have noticed this looping issue and our language team is actively solving it.

I have the same thing, but to describe the photo with tags

Microsoft org

@tankstarwar @Alex01837178373 Can you share your image and prompt?

@tankstarwar @Alex01837178373 Can you share your image and prompt?

IMG_20240524_211705.jpg

@tankstarwar @Alex01837178373 Can you share your image and prompt?

IMG_20240524_211705.jpg

IMG_20240524_211757.jpg

Microsoft org

Thanks for sharing! The Phi-3 Vision model is intended to use in English. In the cases above it looks like the text inputs are in Russian.

Thanks for sharing! The Phi-3 Vision model is intended to use in English. In the cases above it looks like the text inputs are in Russian.

In this case, I used Phi 3 vision to write captions to the photo in the form of tags, I used English language

@tankstarwar @Alex01837178373 Can you share your image and prompt?

Hi Alex,

I used this the dummy tabular image created in spreadsheet.
test1.png

And single prompt like:
{"role": "user", "content": "<|image_1|>\nextract the sales data from the table above and output in json format."}

Microsoft org

Thanks for sharing your example.
I am trying the example on Azure AI https://ai.azure.com/explore/models/Phi-3-vision-128k-instruct/version/2/registry/azureml
image.png
It looks pretty reasonable.

Thanks for sharing your example.
I am trying the example on Azure AI https://ai.azure.com/explore/models/Phi-3-vision-128k-instruct/version/2/registry/azureml
image.png
It looks pretty reasonable.

OK thanks for the feedback, I tried AzureML version and it does work. Previously, I was testing it with some Huggingface space and also on my local machine (no CUDA so disabled flash attention) and got the issue, it could be something wrong with the environment setup I guess.

Sign up or log in to comment