Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published 4 days ago • 37
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published 7 days ago • 19
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published 11 days ago • 12
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published 17 days ago • 23
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 27 days ago • 75
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 54
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 42
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published May 24 • 23
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper • 2405.05254 • Published May 8 • 8
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 72
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 243
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 63
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Paper • 2403.13064 • Published Mar 19 • 31
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21 • 50
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 45
DrugAssist: A Large Language Model for Molecule Optimization Paper • 2401.10334 • Published Dec 28, 2023 • 4
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 180
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 47
Learning and Leveraging World Models in Visual Representation Learning Paper • 2403.00504 • Published Mar 1 • 28
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 581
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19 • 37
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper • 2401.17377 • Published Jan 30 • 32
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 85
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4 • 26
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper • 2312.17172 • Published Dec 28, 2023 • 26
VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper • 2312.14125 • Published Dec 21, 2023 • 41
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 40