Edit model card

Sharded BLIP-2 Model Card - flan-t5-xl

Open In Colab

This is a sharded version of the blip2-flan-t5-xl which leverages Flan T5-xl for image-to-text tasks such as image captioning and visual question answering.

  • this model repo is sharded so it can be easily loaded on low-RAM Colab runtimes :)
  • Refer to the original model card for more details about the model description, intended uses, and limitations, as well as instructions for how to use the model on CPU and GPU in different precisions.


Refer to the original model card for details or see this blog post. Here is how you can use it on CPU:


Requires the current main of transformers (at time of writing):

pip install accelerate git+https://github.com/huggingface/transformers.git -U -q

Use (this is for CPU, check out the original model card/blog for fp16 and int8 usage)

import requests
from PIL import Image
from transformers import BlipProcessor, Blip2ForConditionalGeneration

model_name = "ethzanalytics/blip2-flan-t5-xl-sharded"
processor = BlipProcessor.from_pretrained(model_name)
model = Blip2ForConditionalGeneration.from_pretrained(model_name)

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
Downloads last month
Hosted inference API

Inference API has been turned off for this model.