Fine-tuning for multiple tasks strategy

#32
by gennarino80 - opened

I would like to fine-tune this model on a specific set of images and combining 2 different tasks (used in cascade).

The idea is that once received the input image, the model should perform the image captioning task (MORE_DETAILED_CAPTION) to describe the image, and then use the CAPTION_TO_PHRASE_GROUNDING in order to have a 'visual perspective' of what the model has described (a sort of gradcam of the text).

What should I do in this case? Fine tune the model twice, starting from the image captioning task and then use the obtained model to train the model for the second task?

Sign up or log in to comment