Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 1 day ago • 28
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13, 2024 • 51
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15, 2024 • 28
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published May 1, 2024 • 20
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 32
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 77
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29, 2024 • 69
SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published Apr 22, 2024 • 25
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19, 2024 • 39
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 55
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18, 2024 • 39
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18, 2024 • 26
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11, 2024 • 32
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9, 2024 • 35