AskUI
/

PTA-1

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

maxiw commited on Nov 15, 2024

Commit

4b44d78

·

verified ·

1 Parent(s): 45a615f

Update README.md

Files changed (1) hide show

README.md +6 -14

README.md CHANGED Viewed

@@ -5,11 +5,15 @@ tags:
   - vision
 ---
-# PTA-1
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -201,15 +205,3 @@ print(parsed_answer)
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 [More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

   - vision
 ---
+# PTA-1: Controlling Computers with Small Models
+PTA (Prompt-to-Action) is a vision language model for computer use applications based on Florence-2.
+With less than 300M parameters it beats larger models in GUI text and element localization.
+This allows low latency computer automations with local execution.
+**Model Input:** Screenshot + description_of_target_element
+**Model Output:** BoundingBox for Target Element
 ## Model Details
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 [More Information Needed]