inclusionAI
/

V2P-7B

computer-vision

graphical-user-interface

Model card Files Files and versions

Minstrel54524 commited on 23 days ago

Commit

bbb453a

·

verified ·

1 Parent(s): 6ad8bbe

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -22,14 +22,14 @@ base_model:
 * **Version:** 1.0
 * **Model Type:** GUI Grounding / UI Element Localization
 * **Developers:** Jikai Chen, Long Chen, Dong Wang, Zhixuan Chu, Qinglin Su, Leilei Gan, Chenyi Zhuang, Jinjie Gu
-* **Paper:** [V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task](https://arxiv.org/abs/2508.13634)
-* **Repository:** [Github](https://github.com/inclusionAI/AgenticLearning/tree/main/V2P)
 ### Model Description
 **V2P (Valley-to-Peak)** is an advanced model designed for robust and precise Graphical User Interface (GUI) element localization (grounding). In the field of GUI automation agents, accurately identifying interactive elements on a screen is critical. Traditional methods like bounding box regression or center-point prediction often overlook the spatial uncertainty of interaction and the hierarchical visual-semantic relationships, leading to insufficient localization accuracy.
-The V2P model was developed to address two major pain points in existing methods:
 1.  **Attention Drift due to Background Interference:** The model's attention mistakenly disperses to irrelevant background areas.
 2.  **Imprecise Click Locations:** The model fails to distinguish between the center and the edges of a target element, leading to interaction failures.
@@ -103,4 +103,4 @@ output_ids = generated_ids[0][input_token_len:]
 output_text = processor.decode(output_ids, skip_special_tokens=True)
 print(output_text)
-# For more visualization code, please refer to the code in the V2P GitHub repository.

 * **Version:** 1.0
 * **Model Type:** GUI Grounding / UI Element Localization
 * **Developers:** Jikai Chen, Long Chen, Dong Wang, Zhixuan Chu, Qinglin Su, Leilei Gan, Chenyi Zhuang, Jinjie Gu
+[![Paper](https://img.shields.io/badge/arXiv-2508.13634-b31b1b.svg)](https://arxiv.org/abs/2508.13634) [![Code](https://img.shields.io/badge/GitHub-Repository-blue.svg?logo=github)](https://github.com/inclusionAI/AgenticLearning/tree/main/V2P)
 ### Model Description
 **V2P (Valley-to-Peak)** is an advanced model designed for robust and precise Graphical User Interface (GUI) element localization (grounding). In the field of GUI automation agents, accurately identifying interactive elements on a screen is critical. Traditional methods like bounding box regression or center-point prediction often overlook the spatial uncertainty of interaction and the hierarchical visual-semantic relationships, leading to insufficient localization accuracy.
+The V2P model was developed to address two major pain points in existing visual methods:
 1.  **Attention Drift due to Background Interference:** The model's attention mistakenly disperses to irrelevant background areas.
 2.  **Imprecise Click Locations:** The model fails to distinguish between the center and the edges of a target element, leading to interaction failures.
 output_text = processor.decode(output_ids, skip_special_tokens=True)
 print(output_text)
+# For more visualization code, please refer to the code in the V2P GitHub repository...