GLiNER Collection Knowledgator GLiNER models for information extraction • 8 items • Updated 29 days ago • 9
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 26 days ago • 136
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • Nov 13, 2024 • 98
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published Nov 7, 2024 • 28
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models By ahmed-masry • Oct 18, 2024 • 16
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 108
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 187
Flamingo: a Visual Language Model for Few-Shot Learning Paper • 2204.14198 • Published Apr 29, 2022 • 14
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 28
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27, 2024 • 42
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 638
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task By danaaubakirova • May 16, 2024 • 17
Evaluating Frontier Models for Dangerous Capabilities Paper • 2403.13793 • Published Mar 20, 2024 • 7