--- language: ar license: other tags: - vision - image-captioning pipeline_tag: image-to-text --- # 🦚 Peacock 🦚 Peacock is an InstructBLIP based-model that uses AraLLaMA as its language model. It was introduced in the paper [Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks](https://arxiv.org/abs/2403.01031). # How to use Usage is as follows: ``` from transformers import InstructBlipProcessor, InstructBlipForConditionalGeneration import torch from PIL import Image import requests model = InstructBlipForConditionalGeneration.from_pretrained("UBC-NLP/Peacock") processor = InstructBlipProcessor.from_pretrained("UBC-NLP/Peacock") device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) url = "https://upload.wikimedia.org/wikipedia/commons/8/83/Socotra_dragon_tree.JPG" image = Image.open(requests.get(url, stream=True).raw).convert("RGB") prompt = "اوصف الصوره" inputs = processor(images=image, text=prompt, return_tensors="pt").to(device) outputs = model.generate( **inputs, do_sample=False, num_beams=5, max_length=256, min_length=1, top_p=0.9, repetition_penalty=1.5, length_penalty=1.0, temperature=1, ) generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0].strip() print(generated_text) ``` # Citation If you use this model, please cite the following paper: ```bibtex @inproceedings{alwajih2024peacock, title = {Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks}, author = {Alwajih, Fakhraddin and Nagoudi, El Moatez Billah and Bhatia, Gagan and Mohamed, Abdelrahman and Abdul-Mageed, Muhammad}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages = {12753--12776}, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.689} } ```