Netrava commited on
Commit
aa9ab92
·
verified ·
1 Parent(s): 0b851ec

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -101
README.md DELETED
@@ -1,101 +0,0 @@
1
- ---
2
- title: OmniParser v2.0 API
3
- emoji: 🖼️
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 4.0.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
-
14
- # OmniParser v2.0 API
15
-
16
- This is a public API endpoint for Microsoft's OmniParser v2.0, which can parse UI screenshots and return structured data.
17
-
18
- ## Features
19
-
20
- - Parses UI screenshots into structured JSON data
21
- - Identifies interactive elements (buttons, menus, icons, etc.)
22
- - Provides captions describing the functionality of each element
23
- - Returns visualization of detected elements
24
- - Accessible via a simple REST API
25
-
26
- ## API Usage
27
-
28
- You can use this API by sending a POST request with a file upload:
29
-
30
- ```python
31
- import requests
32
-
33
- # Replace with your actual API URL after deployment
34
- OMNIPARSER_API_URL = "https://your-username-omniparser-api.hf.space/api/parse"
35
-
36
- # Upload a file
37
- files = {'image': open('screenshot.png', 'rb')}
38
-
39
- # Send request
40
- response = requests.post(OMNIPARSER_API_URL, files=files)
41
-
42
- # Get JSON result
43
- result = response.json()
44
-
45
- # Access parsed elements
46
- elements = result["elements"]
47
- for element in elements:
48
- print(f"Element {element['id']}: {element['text']} - {element['caption']}")
49
- print(f"Coordinates: {element['coordinates']}")
50
- print(f"Interactable: {element['is_interactable']}")
51
- print(f"Confidence: {element['confidence']}")
52
- print("---")
53
-
54
- # Access visualization (base64 encoded image)
55
- visualization_base64 = result["visualization"]
56
- ```
57
-
58
- ## Response Format
59
-
60
- The API returns a JSON object with the following structure:
61
-
62
- ```json
63
- {
64
- "status": "success",
65
- "elements": [
66
- {
67
- "id": 0,
68
- "text": "Button 1",
69
- "caption": "Click to submit form",
70
- "coordinates": [0.1, 0.1, 0.3, 0.2],
71
- "is_interactable": true,
72
- "confidence": 0.95
73
- },
74
- {
75
- "id": 1,
76
- "text": "Menu",
77
- "caption": "Navigation menu",
78
- "coordinates": [0.4, 0.5, 0.6, 0.6],
79
- "is_interactable": true,
80
- "confidence": 0.87
81
- }
82
- ],
83
- "visualization": "base64_encoded_image_string"
84
- }
85
- ```
86
-
87
- ## Deployment
88
-
89
- This API is deployed on Hugging Face Spaces using Gradio. The deployment is free and provides a public URL that you can use in your applications.
90
-
91
- ## Credits
92
-
93
- This API uses Microsoft's OmniParser v2.0, which is a screen parsing tool for pure vision-based GUI agents. For more information, visit the [OmniParser GitHub repository](https://github.com/microsoft/OmniParser).
94
-
95
- ## License
96
-
97
- Please note that the OmniParser models have specific licenses:
98
- - icon_detect model is under AGPL license
99
- - icon_caption is under MIT license
100
-
101
- Please refer to the LICENSE file in the folder of each model in the original repository.