Contextual Position Encoding: Learning to Count What's Important Paper • 2405.18719 • Published May 29 • 5
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 110
Harvesting Textual and Structured Data from the HAL Publication Repository Paper • 2407.20595 • Published Jul 30 • 21
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17 • 18
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 108
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1 • 84
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27 • 40
view article Article An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct By leonardlin • Jun 11 • 46
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23 • 59
Vision Language Models Papers 🖼️💬📝 Collection Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30 • 32
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 161
view article Article 🥐CroissantLLM: A Truly Bilingual French-English Language Model By manu • Feb 5 • 9
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction Paper • 2305.02549 • Published May 4, 2023 • 6
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 36
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 47
Computer Vision Backbones 🧩 Collection Collection of useful computer vision backbones to fine-tune. It also includes large image classification models, that can be used as backbone. • 22 items • Updated Sep 19, 2023 • 17
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 28
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53