ColPali Paper Resources Collection Main resources for the paper: "ColPali: Efficient Document Retrieval with Vision Language Models" • 3 items • Updated Jul 2 • 5
view article Article Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face By andreagagliano • 21 days ago • 15
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 9 days ago • 63
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published 11 days ago • 20
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published 24 days ago • 75
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 9 days ago • 339
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 101
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 66
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published 30 days ago • 81
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 9 days ago • 124
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 168
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 161
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Paper • 2403.12895 • Published Mar 19 • 29
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 178