new-blip / README.md
pdich2085's picture
Upload 8 files
feb5784
metadata
tags:
  - image-to-text
  - image-captioning
  - endpoints-template
license: bsd-3-clause
library_name: generic

Fork of Salesforce/blip-image-captioning-large for a image-captioning task on 🤗Inference endpoint.

This repository implements a custom task for image-captioning for 🤗 Inference Endpoints. The code for the customized pipeline is in the pipeline.py. To use deploy this model a an Inference Endpoint you have to select Custom as task to use the pipeline.py file. -> double check if it is selected

expected Request payload

{
  "image": "/9j/4AAQSkZJRgA.....", #encoded image
  "text": "a photography of a"
}

below is an example on how to run a request using Python and requests.

Run Request

  1. Use any online image.
!wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg

2.run request

import json
from typing import List
import requests as r
import base64

with open("/content/demo.jpg", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode()

ENDPOINT_URL = ""
HF_TOKEN = ""

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()


output = query({
    "inputs": {
        "images": [encoded_string],  # using the base64 encoded string
        "texts": ["a photography of"]  # Optional, based on your current class logic
    }
})
print(output)

Example parameters depending on the decoding strategy:

  1. Beam search
        "parameters": {
                   "num_beams":5,
                   "max_length":20
        }
  1. Nucleus sampling
        "parameters": {
                   "num_beams":1,
                   "max_length":20,
                   "do_sample": True,
                   "top_k":50,
                   "top_p":0.95
        }
  1. Contrastive search
        "parameters": {
                   "penalty_alpha":0.6,
                   "top_k":4
                   "max_length":512
        }

See generate() doc for additional detail

expected output

{'captions': ['a photography of a woman and her dog on the beach']}