DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training Paper • 2504.09710 • Published 10 days ago • 19
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability Paper • 2504.08003 • Published 14 days ago • 47
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published 29 days ago • 46
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published about 1 month ago • 18
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published Mar 21 • 36
New Trends for Modern Machine Translation with Large Reasoning Models Paper • 2503.10351 • Published Mar 13 • 23
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 71
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13, 2024 • 52
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15, 2024 • 29
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published May 1, 2024 • 21
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 33
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 78
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29, 2024 • 71
SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published Apr 22, 2024 • 27
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19, 2024 • 40
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 56
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18, 2024 • 40