MobA: A Two-Level Agent System for Efficient Mobile Task Automation
Abstract
Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large language models that enhances comprehension and planning capabilities through a sophisticated two-level agent architecture. The high-level Global Agent (GA) is responsible for understanding user commands, tracking history memories, and planning tasks. The low-level Local Agent (LA) predicts detailed actions in the form of function calls, guided by sub-tasks and memory from the GA. Integrating a Reflection Module allows for efficient task completion and enables the system to handle previously unseen complex tasks. MobA demonstrates significant improvements in task execution efficiency and completion rate in real-life evaluations, underscoring the potential of MLLM-empowered mobile assistants.
Community
🎮MobA manipulates mobile phones just like how you would, with a two-level agent system mimicking brain functions. The "cerebrum" (Global Agent) comprehends, plans, and reflects🎯, while the "cerebellum" (Local Agent) predicts actions based on current information🕹️. It achieves a superior scoring rate of 66.2% in 50 real-world scenarios with similar execution efficiency by human experts.
🎉We have open-sourced MobA on GitHub.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Agent S: An Open Agentic Framework that Uses Computers Like a Human (2024)
- WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration (2024)
- Dynamic Planning for LLM-based Graphical User Interface Automation (2024)
- Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents (2024)
- AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper