Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4 • 27
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22 • 14
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10 • 22
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10 • 36
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 680
In deep reinforcement learning, a pruned network is a good network Paper • 2402.12479 • Published Feb 19 • 17
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 111
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 68
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 42
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 28
Beyond Surface: Probing LLaMA Across Scales and Layers Paper • 2312.04333 • Published Dec 7, 2023 • 18