SlowPacer florentgbelidji HF staff commited on
Commit
5318d99
0 Parent(s):

Duplicate from florentgbelidji/blip_captioning

Browse files

Co-authored-by: Florent Gbelidji <florentgbelidji@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +31 -0
  2. README.md +98 -0
  3. handler.py +49 -0
  4. requirements.txt +1 -0
.gitattributes ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.model filter=lfs diff=lfs merge=lfs -text
11
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
12
+ *.npy filter=lfs diff=lfs merge=lfs -text
13
+ *.npz filter=lfs diff=lfs merge=lfs -text
14
+ *.onnx filter=lfs diff=lfs merge=lfs -text
15
+ *.ot filter=lfs diff=lfs merge=lfs -text
16
+ *.parquet filter=lfs diff=lfs merge=lfs -text
17
+ *.pb filter=lfs diff=lfs merge=lfs -text
18
+ *.pickle filter=lfs diff=lfs merge=lfs -text
19
+ *.pkl filter=lfs diff=lfs merge=lfs -text
20
+ *.pt filter=lfs diff=lfs merge=lfs -text
21
+ *.pth filter=lfs diff=lfs merge=lfs -text
22
+ *.rar filter=lfs diff=lfs merge=lfs -text
23
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
24
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
25
+ *.tflite filter=lfs diff=lfs merge=lfs -text
26
+ *.tgz filter=lfs diff=lfs merge=lfs -text
27
+ *.wasm filter=lfs diff=lfs merge=lfs -text
28
+ *.xz filter=lfs diff=lfs merge=lfs -text
29
+ *.zip filter=lfs diff=lfs merge=lfs -text
30
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
31
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-to-text
4
+ - image-captioning
5
+ - endpoints-template
6
+ license: bsd-3-clause
7
+ library_name: generic
8
+ duplicated_from: florentgbelidji/blip_captioning
9
+ ---
10
+
11
+ # Fork of [salesforce/BLIP](https://github.com/salesforce/BLIP) for a `image-captioning` task on 🤗Inference endpoint.
12
+
13
+ This repository implements a `custom` task for `image-captioning` for 🤗 Inference Endpoints. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/florentgbelidji/blip_captioning/blob/main/pipeline.py).
14
+ To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `pipeline.py` file. -> _double check if it is selected_
15
+ ### expected Request payload
16
+ ```json
17
+ {
18
+ "image": "/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAMCAgICAgMC....", // base64 image as bytes
19
+ }
20
+ ```
21
+ below is an example on how to run a request using Python and `requests`.
22
+ ## Run Request
23
+ 1. prepare an image.
24
+ ```bash
25
+ !wget https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg
26
+ ```
27
+ 2.run request
28
+
29
+ ```python
30
+ import json
31
+ from typing import List
32
+ import requests as r
33
+ import base64
34
+
35
+ ENDPOINT_URL = ""
36
+ HF_TOKEN = ""
37
+
38
+ def predict(path_to_image: str = None):
39
+ with open(path_to_image, "rb") as i:
40
+ image = i.read()
41
+ payload = {
42
+ "inputs": [image],
43
+ "parameters": {
44
+ "do_sample": True,
45
+ "top_p":0.9,
46
+ "min_length":5,
47
+ "max_length":20
48
+ }
49
+ }
50
+ response = r.post(
51
+ ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload
52
+ )
53
+ return response.json()
54
+ prediction = predict(
55
+ path_to_image="palace.jpg"
56
+ )
57
+
58
+ ```
59
+ Example parameters depending on the decoding strategy:
60
+
61
+ 1. Beam search
62
+
63
+ ```
64
+ "parameters": {
65
+ "num_beams":5,
66
+ "max_length":20
67
+ }
68
+ ```
69
+
70
+ 2. Nucleus sampling
71
+
72
+ ```
73
+ "parameters": {
74
+ "num_beams":1,
75
+ "max_length":20,
76
+ "do_sample": True,
77
+ "top_k":50,
78
+ "top_p":0.95
79
+ }
80
+ ```
81
+
82
+ 3. Contrastive search
83
+
84
+ ```
85
+ "parameters": {
86
+ "penalty_alpha":0.6,
87
+ "top_k":4
88
+ "max_length":512
89
+ }
90
+ ```
91
+
92
+ See [generate()](https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/text_generation#transformers.GenerationMixin.generate) doc for additional detail
93
+
94
+
95
+ expected output
96
+ ```python
97
+ ['buckingham palace with flower beds and red flowers']
98
+ ```
handler.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # +
2
+ from typing import Dict, List, Any
3
+ from PIL import Image
4
+ import torch
5
+ import os
6
+ from io import BytesIO
7
+ from transformers import BlipForConditionalGeneration, BlipProcessor
8
+ # -
9
+
10
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
11
+
12
+ class EndpointHandler():
13
+ def __init__(self, path=""):
14
+ # load the optimized model
15
+
16
+ self.processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
17
+ self.model = BlipForConditionalGeneration.from_pretrained(
18
+ "Salesforce/blip-image-captioning-base"
19
+ ).to(device)
20
+ self.model.eval()
21
+ self.model = self.model.to(device)
22
+
23
+
24
+
25
+ def __call__(self, data: Any) -> Dict[str, Any]:
26
+ """
27
+ Args:
28
+ data (:obj:):
29
+ includes the input data and the parameters for the inference.
30
+ Return:
31
+ A :obj:`dict`:. The object returned should be a dict of one list like {"captions": ["A hugging face at the office"]} containing :
32
+ - "caption": A string corresponding to the generated caption.
33
+ """
34
+ inputs = data.pop("inputs", data)
35
+ parameters = data.pop("parameters", {})
36
+
37
+ raw_images = [Image.open(BytesIO(_img)) for _img in inputs]
38
+
39
+ processed_image = self.processor(images=raw_images, return_tensors="pt")
40
+ processed_image["pixel_values"] = processed_image["pixel_values"].to(device)
41
+ processed_image = {**processed_image, **parameters}
42
+
43
+ with torch.no_grad():
44
+ out = self.model.generate(
45
+ **processed_image
46
+ )
47
+ captions = self.processor.batch_decode(out, skip_special_tokens=True)
48
+ # postprocess the prediction
49
+ return {"captions": captions}
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ git+https://github.com/huggingface/transformers.git@main