nbroad
/

p2s-infographic-lg-endpt

Inference Endpoints

Model card Files Files and versions Community

nbroad HF staff commited on Nov 23, 2023

Commit

cc16e63

•

1 Parent(s): 73353d8

Update README.md

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+This repo deploys a pix2struct model to [Inference Endpoints](https://ui.endpoints.huggingface.co/) to a single GPU. This is not meant for multiple GPUs (single gpu replicas are ok; don't use the 4xT4 option, though) or CPU.
+Options:
+ - modify the [model_name](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L19) with whichever model you want to use
+   - if it is a private model, just add handler.py to that model repo and changed `model_name` to `"./"`
+ - modify the [dtype](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L27) to whichever one you want
+   - [see notes here](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L21-L26) for dtype tradeoffs
+After deploying the model, inference can be done as follows:
+```python
+import base64
+with open("path/to/image", "rb") as f:
+    b64 = base64.b64encode(f.read())
+question = "question to model"
+payload = {
+    "inputs": {
+        "image": [b64.decode("utf-8")], # for batched inference, send list of images/questions
+        "question": [question]
+        },
+    "parameters":{
+        "max_new_tokens": 10, # can use any generation parameters
+    }}
+import requests
+API_URL = "url_to_endpoint"
+headers = {
+	"Authorization": "Bearer HF_TOKEN",
+	"Content-Type": "application/json"
+}
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+output = query(payload)
+# {'output': ['55%']}
+```