Please add VQA capability!

#4
by GabbyJay - opened

One of the most used tasks (at least in my opinion) is visual question answering aka VQA. Unfortunately, the model does not include this task. Would it be possible to extend the demo to support it as well? The HF-blog article [1] mentions how to finetune Florence-2. However, I don't have the resources for finetuning it :-)

[1] https://huggingface.co/blog/finetune-florence2

I think you can find a dataset and try it free on google colab. Its very easy to fine tune and it takes ~1.5 hours with T4 on colab for 3500 train data samples.

Huggingface team finetuned it, here is the model card https://huggingface.co/HuggingFaceM4/Florence-2-DocVQA and related space https://huggingface.co/spaces/andito/Florence-2-DocVQA

How to interact with this space using python??

Sign up or log in to comment