Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 7 days ago • 35
Continuous Speculative Decoding for Autoregressive Image Generation Paper • 2411.11925 • Published 4 days ago • 13
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published 4 days ago • 16
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 7 days ago • 90
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 74
Direct Preference Optimization Using Sparse Feature-Level Constraints Paper • 2411.07618 • Published 10 days ago • 15
ViDoRe Benchmark Collection Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the QA format. • 10 items • Updated 5 days ago • 11
LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering Paper • 2411.00556 • Published 21 days ago • 1
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published 15 days ago • 27
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 15 days ago • 48
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 9 items • Updated 4 days ago • 71
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 15 days ago • 108
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 77
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23 • 37
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published 17 days ago • 62
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 181