Add zero-shot classification task for BLIP-2

by youssefadarrab - opened

Is it possible to add support for zero-shot classification task using BLIP2, computing text-image similarities with the normalized embeddings, that would be accessed from BLIP2 feature extractor ?


For that one could add get_image_features and get_text_features methods to Blip2ForConditionalGeneration. These could be implemented based on the original implementation:

Feel free to open an issue on Github so this can be contributed


I will add an issue on github, I would also love to contribute with a PR!

Sign up or log in to comment