nbroad
/

p2s-infographic-lg-endpt

Inference Endpoints

Model card Files Files and versions Community

p2s-infographic-lg-endpt / README.md

nbroad's picture

nbroad HF staff

Update README.md

cc16e63 8 months ago

|

history blame contribute delete

No virus

1.53 kB

	---
	license: apache-2.0
	---

	This repo deploys a pix2struct model to [Inference Endpoints](https://ui.endpoints.huggingface.co/) to a single GPU. This is not meant for multiple GPUs (single gpu replicas are ok; don't use the 4xT4 option, though) or CPU.

	Options:
	- modify the [model_name](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L19) with whichever model you want to use
	- if it is a private model, just add handler.py to that model repo and changed `model_name` to `"./"`
	- modify the [dtype](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L27) to whichever one you want
	- [see notes here](https://huggingface.co/nbroad/p2s-infographic-lg-endpt/blob/main/handler.py#L21-L26) for dtype tradeoffs




	After deploying the model, inference can be done as follows:


	```python
	import base64

	with open("path/to/image", "rb") as f:
	b64 = base64.b64encode(f.read())

	question = "question to model"

	payload = {
	"inputs": {
	"image": [b64.decode("utf-8")], # for batched inference, send list of images/questions
	"question": [question]
	},
	"parameters":{
	"max_new_tokens": 10, # can use any generation parameters
	}}

	import requests

	API_URL = "url_to_endpoint"
	headers = {
	"Authorization": "Bearer HF_TOKEN",
	"Content-Type": "application/json"
	}

	def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

	output = query(payload)

	# {'output': ['55%']}
	```