-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 23 -
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 12 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2312.05708
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 60 -
Learning To Teach Large Language Models Logical Reasoning
Paper • 2310.09158 • Published • 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 7 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper • 2308.09583 • Published • 7
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 14 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 25 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 39 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 45