Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8 • 82
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published about 1 month ago • 76
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Paper • 2410.18967 • Published Oct 24 • 1
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30 • 46
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10 • 24