Interesting work! I want to use the alignment between images and text in the encoder of this model for downstream tasks. How should I use it?
· Sign up or log in to comment