|
--- |
|
tags: |
|
- image-to-text |
|
- image-captioning |
|
- endpoints-template |
|
license: bsd-3-clause |
|
library_name: generic |
|
--- |
|
|
|
# Fork of [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) for a `image-captioning` task on 🤗Inference endpoint. |
|
|
|
This repository implements a `custom` task for `image-captioning` for 🤗 Inference Endpoints. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/florentgbelidji/blip_captioning/blob/main/pipeline.py). |
|
To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `pipeline.py` file. -> _double check if it is selected_ |
|
### expected Request payload |
|
```json |
|
{ |
|
"image": "/9j/4AAQSkZJRgA.....", #encoded image |
|
"text": "a photography of a" |
|
} |
|
``` |
|
below is an example on how to run a request using Python and `requests`. |
|
## Run Request |
|
1. Use any online image. |
|
```bash |
|
!wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg |
|
``` |
|
2.run request |
|
|
|
```python |
|
import json |
|
from typing import List |
|
import requests as r |
|
import base64 |
|
|
|
with open("/content/demo.jpg", "rb") as image_file: |
|
encoded_string = base64.b64encode(image_file.read()).decode() |
|
|
|
ENDPOINT_URL = "" |
|
HF_TOKEN = "" |
|
|
|
def query(payload): |
|
response = requests.post(API_URL, headers=headers, json=payload) |
|
return response.json() |
|
|
|
|
|
output = query({ |
|
"inputs": { |
|
"images": [encoded_string], # using the base64 encoded string |
|
"texts": ["a photography of"] # Optional, based on your current class logic |
|
} |
|
}) |
|
print(output) |
|
``` |
|
|
|
Example parameters depending on the decoding strategy: |
|
|
|
1. Beam search |
|
|
|
``` |
|
"parameters": { |
|
"num_beams":5, |
|
"max_length":20 |
|
} |
|
``` |
|
|
|
2. Nucleus sampling |
|
|
|
``` |
|
"parameters": { |
|
"num_beams":1, |
|
"max_length":20, |
|
"do_sample": True, |
|
"top_k":50, |
|
"top_p":0.95 |
|
} |
|
``` |
|
|
|
3. Contrastive search |
|
|
|
``` |
|
"parameters": { |
|
"penalty_alpha":0.6, |
|
"top_k":4 |
|
"max_length":512 |
|
} |
|
``` |
|
|
|
See [generate()](https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/text_generation#transformers.GenerationMixin.generate) doc for additional detail |
|
|
|
|
|
expected output |
|
```python |
|
{'captions': ['a photography of a woman and her dog on the beach']} |
|
``` |
|
|