File size: 1,531 Bytes
0876ee7
 
 
cc16e63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: apache-2.0
---

This repo deploys a pix2struct model to [Inference Endpoints](https://ui.endpoints.huggingface.co/) to a single GPU. This is not meant for multiple GPUs (single gpu replicas are ok; don't use the 4xT4 option, though) or CPU. 

Options:
 - modify the [model_name](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L19) with whichever model you want to use
   - if it is a private model, just add handler.py to that model repo and changed `model_name` to `"./"`
 - modify the [dtype](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L27) to whichever one you want
   - [see notes here](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L21-L26) for dtype tradeoffs
     



After deploying the model, inference can be done as follows:


```python
import base64

with open("path/to/image", "rb") as f:
    b64 = base64.b64encode(f.read())

question = "question to model"

payload = {
    "inputs": {
        "image": [b64.decode("utf-8")], # for batched inference, send list of images/questions
        "question": [question] 
        }, 
    "parameters":{
        "max_new_tokens": 10, # can use any generation parameters
    }}

import requests

API_URL = "url_to_endpoint"
headers = {
	"Authorization": "Bearer HF_TOKEN",
	"Content-Type": "application/json"
}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

output = query(payload)

# {'output': ['55%']}
```