--- license: apache-2.0 --- This repo deploys a pix2struct model to [Inference Endpoints](https://ui.endpoints.huggingface.co/) to a single GPU. This is not meant for multiple GPUs (single gpu replicas are ok; don't use the 4xT4 option, though) or CPU. Options: - modify the [model_name](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L19) with whichever model you want to use - if it is a private model, just add handler.py to that model repo and changed `model_name` to `"./"` - modify the [dtype](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L27) to whichever one you want - [see notes here](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L21-L26) for dtype tradeoffs After deploying the model, inference can be done as follows: ```python import base64 with open("path/to/image", "rb") as f: b64 = base64.b64encode(f.read()) question = "question to model" payload = { "inputs": { "image": [b64.decode("utf-8")], # for batched inference, send list of images/questions "question": [question] }, "parameters":{ "max_new_tokens": 10, # can use any generation parameters }} import requests API_URL = "url_to_endpoint" headers = { "Authorization": "Bearer HF_TOKEN", "Content-Type": "application/json" } def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query(payload) # {'output': ['55%']} ```