--- datasets: - MMInstruction/M3IT pipeline_tag: image-to-text --- This model is fintuned on instruction dataset using `SalesForce/blip-imagecaptioning-base` model. ## Usage: ``` from transformers import BlipProcessor, BlipForConditionalGeneration import torch from PIL import Image processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") if processor.tokenizer.eos_token is None: processor.tokenizer.eos_token = '<|eos|>' model = BlipForConditionalGeneration.from_pretrained("prasanna2003/Instruct-blip-v2") image = Image.open('file_name.jpg').convert('RGB') prompt = """Instruction: Answer the following input according to the image. Input: Describe this image. output: """ inputs = processor(image, prompt, return_tensors="pt") output = model.generate(**inputs, max_length=100) print(tokenizer.decode(output[0])) ```