VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Paper • 2406.08394 • Published Jun 12 • 2
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model Paper • 2409.13407 • Published Sep 20 • 2
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51 • 10
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51 • 10