microsoft/Florence-2-large · Fine-tuning for multiple tasks strategy

Jun 28

I would like to fine-tune this model on a specific set of images and combining 2 different tasks (used in cascade).

The idea is that once received the input image, the model should perform the image captioning task (MORE_DETAILED_CAPTION) to describe the image, and then use the CAPTION_TO_PHRASE_GROUNDING in order to have a 'visual perspective' of what the model has described (a sort of gradcam of the text).

What should I do in this case? Fine tune the model twice, starting from the image captioning task and then use the obtained model to train the model for the second task?

anhdang000

Oct 11

Same here, I am working on Chart Question Answering and would like to fine-tune this model on multitask (Visual Question Answering and Object Detection). Of course that I don't want to fine-tune the model twice.

Have you found a way to do that?

siddiqueMLS

23 days ago

I am also looking for multi-task learning strategy. My understanding from the paper is that we need to design datasets for multi-task prediction from the single loss-based florence2.