Accuracy question

#1
by arnaucas - opened

Hi,

I have tried this model using a few sample images and I was wondering if giving OCR inputs to the model makes it better than the standard VQA one.
In other words, does giving OCR inputs improve the accuracy of the model?

Thanks

TIFA org

Yes, we observed a great increase in TextVQA accuracy by adding OCR inputs. PromptCap knows how to deal with OCR inputs.
This repo is still under construction and we haven't PR about it. Soon we will release a vision-language model that is on par with BLIP2.
Thanks for your interest! We will keep you posted!

Any update on this new vision language model? excited to see the results.

Sign up or log in to comment