metadata
license: apache-2.0
This repo deploys a pix2struct model to Inference Endpoints to a single GPU. This is not meant for multiple GPUs (single gpu replicas are ok; don't use the 4xT4 option, though) or CPU.
Options:
- modify the model_name with whichever model you want to use
- if it is a private model, just add handler.py to that model repo and changed
model_name
to"./"
- if it is a private model, just add handler.py to that model repo and changed
- modify the dtype to whichever one you want
- see notes here for dtype tradeoffs
After deploying the model, inference can be done as follows:
import base64
with open("path/to/image", "rb") as f:
b64 = base64.b64encode(f.read())
question = "question to model"
payload = {
"inputs": {
"image": [b64.decode("utf-8")], # for batched inference, send list of images/questions
"question": [question]
},
"parameters":{
"max_new_tokens": 10, # can use any generation parameters
}}
import requests
API_URL = "url_to_endpoint"
headers = {
"Authorization": "Bearer HF_TOKEN",
"Content-Type": "application/json"
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query(payload)
# {'output': ['55%']}