Papers
arxiv:2312.11190

Navigating Interfaces with AI for Enhanced User Interaction

Published on Dec 18, 2023
Authors:
,
,
,

Abstract

This study introduces an innovative framework designed to automate tasks by interacting with UIs through a sequential, human-like problem-solving approach. Our approach initially transforms UI screenshots into natural language explanations through a vision-based UI analysis, circumventing traditional view hierarchy limitations. It then methodically engages with each interface, guiding the LLM to pinpoint and act on relevant UI elements, thus bolstering both precision and functionality. Employing the ERNIE Bot LLM, our approach has been demonstrated to surpass existing methodologies. It delivers superior UI interpretation across various datasets and exhibits remarkable efficiency in automating varied tasks on an Android smartphone, outperforming human capabilities in intricate tasks and significantly enhancing the PBD process.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.11190 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.11190 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.11190 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.