-
ColPali: Efficient Document Retrieval with Vision Language Models
Paper ā¢ 2407.01449 ā¢ Published ā¢ 45 -
vidore/colpali
Visual Document Retrieval ā¢ Updated ā¢ 16.4k ā¢ 425 -
vidore/colpali_train_set
Viewer ā¢ Updated ā¢ 119k ā¢ 2.34k ā¢ 75 -
119
Vidore Leaderboard
š„Display Visual Document Retrieval leaderboard
Collections
Discover the best community collections!
Collections including paper arxiv:2407.01449
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper ā¢ 2405.07863 ā¢ Published ā¢ 68 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper ā¢ 2405.15574 ā¢ Published ā¢ 55 -
An Introduction to Vision-Language Modeling
Paper ā¢ 2405.17247 ā¢ Published ā¢ 88
-
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper ā¢ 2406.04692 ā¢ Published ā¢ 58 -
CRAG -- Comprehensive RAG Benchmark
Paper ā¢ 2406.04744 ā¢ Published ā¢ 47 -
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Paper ā¢ 2406.04594 ā¢ Published ā¢ 8 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper ā¢ 2406.04271 ā¢ Published ā¢ 30
-
Iterative Reasoning Preference Optimization
Paper ā¢ 2404.19733 ā¢ Published ā¢ 48 -
Better & Faster Large Language Models via Multi-token Prediction
Paper ā¢ 2404.19737 ā¢ Published ā¢ 77 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper ā¢ 2403.07691 ā¢ Published ā¢ 65 -
KAN: Kolmogorov-Arnold Networks
Paper ā¢ 2404.19756 ā¢ Published ā¢ 111
-
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Paper ā¢ 2401.13313 ā¢ Published ā¢ 5 -
BAAI/Bunny-v1_0-4B
Text Generation ā¢ Updated ā¢ 162 ā¢ 9 -
What matters when building vision-language models?
Paper ā¢ 2405.02246 ā¢ Published ā¢ 102 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper ā¢ 2405.20204 ā¢ Published ā¢ 35
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ā¢ 2401.13601 ā¢ Published ā¢ 47 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ā¢ 2402.13232 ā¢ Published ā¢ 15 -
Neural Network Diffusion
Paper ā¢ 2402.13144 ā¢ Published ā¢ 95 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ā¢ 2402.13251 ā¢ Published ā¢ 15
-
Self-Rewarding Language Models
Paper ā¢ 2401.10020 ā¢ Published ā¢ 147 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper ā¢ 2401.08967 ā¢ Published ā¢ 30 -
Tuning Language Models by Proxy
Paper ā¢ 2401.08565 ā¢ Published ā¢ 23 -
TrustLLM: Trustworthiness in Large Language Models
Paper ā¢ 2401.05561 ā¢ Published ā¢ 69
-
Masked Autoencoders Are Scalable Vision Learners
Paper ā¢ 2111.06377 ā¢ Published ā¢ 3 -
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper ā¢ 2311.00430 ā¢ Published ā¢ 59 -
distil-whisper/distil-large-v2
Automatic Speech Recognition ā¢ Updated ā¢ 383k ā¢ ā¢ 506 -
Seven Failure Points When Engineering a Retrieval Augmented Generation System
Paper ā¢ 2401.05856 ā¢ Published ā¢ 2
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper ā¢ 2309.11495 ā¢ Published ā¢ 38 -
Adapting Large Language Models via Reading Comprehension
Paper ā¢ 2309.09530 ā¢ Published ā¢ 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper ā¢ 2309.09400 ā¢ Published ā¢ 85 -
Language Modeling Is Compression
Paper ā¢ 2309.10668 ā¢ Published ā¢ 83