|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
This repo deploys a pix2struct model to [Inference Endpoints](https://ui.endpoints.huggingface.co/) to a single GPU. This is not meant for multiple GPUs (single gpu replicas are ok; don't use the 4xT4 option, though) or CPU. |
|
|
|
Options: |
|
- modify the [model_name](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L19) with whichever model you want to use |
|
- if it is a private model, just add handler.py to that model repo and changed `model_name` to `"./"` |
|
- modify the [dtype](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L27) to whichever one you want |
|
- [see notes here](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L21-L26) for dtype tradeoffs |
|
|
|
|
|
|
|
|
|
After deploying the model, inference can be done as follows: |
|
|
|
|
|
```python |
|
import base64 |
|
|
|
with open("path/to/image", "rb") as f: |
|
b64 = base64.b64encode(f.read()) |
|
|
|
question = "question to model" |
|
|
|
payload = { |
|
"inputs": { |
|
"image": [b64.decode("utf-8")], # for batched inference, send list of images/questions |
|
"question": [question] |
|
}, |
|
"parameters":{ |
|
"max_new_tokens": 10, # can use any generation parameters |
|
}} |
|
|
|
import requests |
|
|
|
API_URL = "url_to_endpoint" |
|
headers = { |
|
"Authorization": "Bearer HF_TOKEN", |
|
"Content-Type": "application/json" |
|
} |
|
|
|
def query(payload): |
|
response = requests.post(API_URL, headers=headers, json=payload) |
|
return response.json() |
|
|
|
output = query(payload) |
|
|
|
# {'output': ['55%']} |
|
``` |