Collections
Discover the best community collections!
Collections including paper arxiv:2308.13418
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper β’ 2306.17107 β’ Published β’ 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper β’ 2305.07895 β’ Published -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 7 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper β’ 2401.15947 β’ Published β’ 49
-
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 35 -
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Paper β’ 2307.02499 β’ Published β’ 15 -
Text Rendering Strategies for Pixel Language Models
Paper β’ 2311.00522 β’ Published β’ 10