Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 9 days ago • 54
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 31
Llama 2 Family Collection This collection hosts the transformers and original repos of the Llama 2 and Llama Guard releases • 13 items • Updated 12 days ago • 76
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18 • 16
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 183 • 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 183
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 126
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping Paper • 2402.14083 • Published Feb 21 • 47
TravelPlanner: A Benchmark for Real-World Planning with Language Agents Paper • 2402.01622 • Published Feb 2 • 33
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time Paper • 2310.17157 • Published Oct 26, 2023 • 12
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment Paper • 2307.12950 • Published Jul 24, 2023 • 9
Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning Paper • 2302.01578 • Published Feb 3, 2023
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Paper • 2306.14048 • Published Jun 24, 2023 • 12
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Paper • 2305.16380 • Published May 25, 2023 • 4