Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities Paper • 2308.12966 • Published Aug 24, 2023 • 6
SILC: Improving Vision Language Pretraining with Self-Distillation Paper • 2310.13355 • Published Oct 20, 2023 • 5
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions Paper • 2311.12793 • Published Nov 21, 2023 • 18
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 38
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18 • 35
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 49
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 31