HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27, 2024 • 61
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Paper • 2411.02327 • Published Nov 4, 2024 • 11
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 62
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 112