Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Paper • 2403.09333 • Published Mar 14 • 14 • 3
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28 • 3