PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published Mar 10 • 82
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated about 1 month ago • 104
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published Dec 18, 2024 • 24
DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper • 2412.13377 • Published Dec 17, 2024 • 2
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 29
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published Dec 18, 2024 • 11
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 52
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 2 days ago • 566
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2, 2024 • 24
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 5 days ago • 155
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders Paper • 2407.13036 • Published Jul 17, 2024 • 2
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 51
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27, 2024 • 47
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 211