--- tags: - image-to-text - image-captioning - endpoints-template license: bsd-3-clause library_name: generic --- # Fork of [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) for a `image-captioning` task on 🤗Inference endpoint. This repository implements a `custom` task for `image-captioning` for 🤗 Inference Endpoints. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/florentgbelidji/blip_captioning/blob/main/pipeline.py). To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file. -> _double check if it is selected_ ### expected Request payload ```json { "image": "/9j/4AAQSkZJRgA.....", #encoded image "text": "a photography of a" } ``` below is an example on how to run a request using Python and `requests`. ## Run Request 1. Use any online image. ```bash !wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg ``` 2.run request ```python import json from typing import List import requests as r import base64 with open("/content/demo.jpg", "rb") as image_file: encoded_string = base64.b64encode(image_file.read()).decode() ENDPOINT_URL = "" HF_TOKEN = "" def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({ "inputs": { "images": [encoded_string], # using the base64 encoded string "texts": ["a photography of"] # Optional, based on your current class logic } }) print(output) ``` Example parameters depending on the decoding strategy: 1. Beam search ``` "parameters": { "num_beams":5, "max_length":20 } ``` 2. Nucleus sampling ``` "parameters": { "num_beams":1, "max_length":20, "do_sample": True, "top_k":50, "top_p":0.95 } ``` 3. Contrastive search ``` "parameters": { "penalty_alpha":0.6, "top_k":4 "max_length":512 } ``` See [generate()](https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/text_generation#transformers.GenerationMixin.generate) doc for additional detail expected output ```python {'captions': ['a photography of a woman and her dog on the beach']} ```