"Use in Transformers" feature is wrong

by smartinezbragado - opened

The right way to load this model is:

from transformers import Blip2Processor, Blip2ForConditionalGeneration

processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xl-coco")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-flan-t5-xl-coco", torch_dtype=torch.float16

Thanks for flagging. cc @osanseviero do you know why AutoModelForVisualQuestionAnswering is displayed in the "use with Transformers" UI? That class doesn't even exist in the Transformers library

Ok thanks, actually I think we should deprecate BLIP and BLIP-2 for that class, and move them to the new "image-text-to-text" pipeline, which is added in https://github.com/huggingface/transformers/pull/29572.

Sign up or log in to comment