Fine-tuning for multiple tasks strategy

#32
by gennarino80 - opened

I would like to fine-tune this model on a specific set of images and combining 2 different tasks (used in cascade).

The idea is that once received the input image, the model should perform the image captioning task (MORE_DETAILED_CAPTION) to describe the image, and then use the CAPTION_TO_PHRASE_GROUNDING in order to have a 'visual perspective' of what the model has described (a sort of gradcam of the text).

What should I do in this case? Fine tune the model twice, starting from the image captioning task and then use the obtained model to train the model for the second task?

Same here, I am working on Chart Question Answering and would like to fine-tune this model on multitask (Visual Question Answering and Object Detection). Of course that I don't want to fine-tune the model twice.

Have you found a way to do that?

I am also looking for multi-task learning strategy. My understanding from the paper is that we need to design datasets for multi-task prediction from the single loss-based florence2.

Sign up or log in to comment