Multi Modal Capabilities?

by Vvkishere - opened Jan 14, 2023

Jan 14, 2023

I think this also has potential if we can train models like Clip to change domains for image based data. The usage could be more fine tuned analysis of object detection and image retrieval.

multi-train

NLP Group of The University of Hong Kong org Jan 14, 2023

Thanks for your comment!

I agree that the INSTRUCTOR model can be potentially applied in other domains for computer vision. The key part would probably be the design of the instruction and the merge to the input. For the training/fine-tuning part, I think it would be similar to that of the INSTRUCTOR model with contrastive loss.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment