--- library_name: transformers license: mit pipeline_tag: image-to-text --- # Blip Image Captioning Base BF16 This model is a quantized version of the [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), an image-to-text model. From a memory footprint of 989 MBs -> 494 MBs by quantizing the percision of float32 to bfloat 16, reducing the model's memory size by 50 percent. ## Example | | |---| | a cat sitting on top of a purple and red striped carpet | ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import BlipForConditionalGeneration, BlipProcessor import requests from PIL import Image model = BlipForConditionalGeneration.from_pretrained("gospacedev/blip-image-captioning-base-bf16") processor = BlipProcessor.from_pretrained("gospacedev/blip-image-captioning-base-bf16") # Load sample image image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') # Generate output inputs = processor(image, return_tensors="pt") output = model.generate(**inputs) result = processor.decode(out[0], skip_special_tokens=True) print(results) ``` ## Model Details - **Developed by:** Grantley Cullar - **Model type:** Image-to-Text - **Language(s) (NLP):** English - **License:** MIT License