AskUI
/

pta-text-0.1

Model card Files Files and versions Community

pta-text-0.1 / README.md

gitlost-murali's picture

Update README.md

7a3ed69 verified 11 months ago

|

2.4 kB

	---
	license: gpl-3.0
	tags:
	- ui-automation
	- automation
	- agents
	- llm-agents
	- vision
	---

	# Model card for PTA-Text - A Text Only Click Model


	# Table of Contents

	0. [TL;DR](#TL;DR)
	1. [Using the model](#running-the-model)
	2. [Contribution](#contribution)
	3. [Citation](#citation)

	# TL;DR

	## Details for PTA-Text:
	-> __Input__: An image with a header containing the desired UI click command.

	-> __Output__: [x,y] coordinate in relative coordinates 0-1 range.

	__PTA-Text__ is an image encoder based on Matcha, which is an extension of Pix2Struct

	# Installation

	```bash
	pip install askui-ml-helper
	```

	Download the checkpoint ".pt" model from files in this model card.
	Or download it from your terminal
	```bash
	curl -L "https://huggingface.co/AskUI/pta-text-0.1/resolve/main/pta-text-v0.1.pt?download=true" -o pta-text-v0.1.pt
	```

	## Running the model

	### Get the annotated image

	You can run the model in full precision on CPU:
	```python
	import requests
	from PIL import Image
	from askui_ml_helper.utils.pta_text import PtaTextInference

	pta_text_inference = PtaTextInference("pta-text-v0.1.pt")
	url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
	image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
	prompt = 'click on the text "Operating System"'

	render_image = pta_text_inference.process_image_and_draw_circle(image, prompt, radius=15)
	render_image.show()
	>>> Uploaded image with "a red dot", where click operation is predicted
	```

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/5f993a63777efc07d7f1e2ce/ZNwjdENJqn-1VpXDcm_Wg.png)

	### Get the coordinates

	```python
	import requests
	from PIL import Image
	from askui_ml_helper.utils.pta_text import PtaTextInference

	pta_text_inference = PtaTextInference("pta-text-v0.1.pt")
	url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
	image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
	prompt = 'click on the text "Operating System"'

	coordinates = pta_text_inference.process_image(image, prompt)
	coordinates
	>>> [0.3981265723705292, 0.13768285512924194]
	```

	# Contribution

	An AskUI's open source initiative. This model is contributed and added to the Hugging Face ecosystem by [Murali Manohar @ AskUI](https://huggingface.co/gitlost-murali).

	# Citation

	TODO