-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 566 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 172 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 49 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2404.07965
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 10 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 31 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
Attention Is All You Need
Paper • 1706.03762 • Published • 34
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 80 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 12 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 25 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 19
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 80 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 62 -
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 53