Update README.md
Browse files
README.md
CHANGED
|
@@ -22,14 +22,14 @@ base_model:
|
|
| 22 |
* **Version:** 1.0
|
| 23 |
* **Model Type:** GUI Grounding / UI Element Localization
|
| 24 |
* **Developers:** Jikai Chen, Long Chen, Dong Wang, Zhixuan Chu, Qinglin Su, Leilei Gan, Chenyi Zhuang, Jinjie Gu
|
| 25 |
-
|
| 26 |
-
|
| 27 |
|
| 28 |
### Model Description
|
| 29 |
|
| 30 |
**V2P (Valley-to-Peak)** is an advanced model designed for robust and precise Graphical User Interface (GUI) element localization (grounding). In the field of GUI automation agents, accurately identifying interactive elements on a screen is critical. Traditional methods like bounding box regression or center-point prediction often overlook the spatial uncertainty of interaction and the hierarchical visual-semantic relationships, leading to insufficient localization accuracy.
|
| 31 |
|
| 32 |
-
The V2P model was developed to address two major pain points in existing methods:
|
| 33 |
1. **Attention Drift due to Background Interference:** The model's attention mistakenly disperses to irrelevant background areas.
|
| 34 |
2. **Imprecise Click Locations:** The model fails to distinguish between the center and the edges of a target element, leading to interaction failures.
|
| 35 |
|
|
@@ -103,4 +103,4 @@ output_ids = generated_ids[0][input_token_len:]
|
|
| 103 |
output_text = processor.decode(output_ids, skip_special_tokens=True)
|
| 104 |
|
| 105 |
print(output_text)
|
| 106 |
-
# For more visualization code, please refer to the code in the V2P GitHub repository
|
|
|
|
| 22 |
* **Version:** 1.0
|
| 23 |
* **Model Type:** GUI Grounding / UI Element Localization
|
| 24 |
* **Developers:** Jikai Chen, Long Chen, Dong Wang, Zhixuan Chu, Qinglin Su, Leilei Gan, Chenyi Zhuang, Jinjie Gu
|
| 25 |
+
|
| 26 |
+
[](https://arxiv.org/abs/2508.13634) [](https://github.com/inclusionAI/AgenticLearning/tree/main/V2P)
|
| 27 |
|
| 28 |
### Model Description
|
| 29 |
|
| 30 |
**V2P (Valley-to-Peak)** is an advanced model designed for robust and precise Graphical User Interface (GUI) element localization (grounding). In the field of GUI automation agents, accurately identifying interactive elements on a screen is critical. Traditional methods like bounding box regression or center-point prediction often overlook the spatial uncertainty of interaction and the hierarchical visual-semantic relationships, leading to insufficient localization accuracy.
|
| 31 |
|
| 32 |
+
The V2P model was developed to address two major pain points in existing visual methods:
|
| 33 |
1. **Attention Drift due to Background Interference:** The model's attention mistakenly disperses to irrelevant background areas.
|
| 34 |
2. **Imprecise Click Locations:** The model fails to distinguish between the center and the edges of a target element, leading to interaction failures.
|
| 35 |
|
|
|
|
| 103 |
output_text = processor.decode(output_ids, skip_special_tokens=True)
|
| 104 |
|
| 105 |
print(output_text)
|
| 106 |
+
# For more visualization code, please refer to the code in the V2P GitHub repository...
|