Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published 28 days ago • 28
RegionGPT: Towards Region Understanding Vision Language Model Paper • 2403.02330 • Published Mar 4 • 2
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published 20 days ago • 27