Accuracy question

by arnaucas - opened Feb 8, 2023

Feb 8, 2023

Hi,

I have tried this model using a few sample images and I was wondering if giving OCR inputs to the model makes it better than the standard VQA one.
In other words, does giving OCR inputs improve the accuracy of the model?

Thanks

yushihu

TIFA org Feb 8, 2023

Yes, we observed a great increase in TextVQA accuracy by adding OCR inputs. PromptCap knows how to deal with OCR inputs.
This repo is still under construction and we haven't PR about it. Soon we will release a vision-language model that is on par with BLIP2.
Thanks for your interest! We will keep you posted!

cjd314

Apr 28, 2023

Any update on this new vision language model? excited to see the results.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment