NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models Paper • 2407.12366 • Published 6 days ago • 3 • 2
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Paper • 2406.19263 • Published 25 days ago • 9 • 2