-
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 22 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Stealing Part of a Production Language Model
Paper • 2403.06634 • Published • 90
Collections
Discover the best community collections!
Collections including paper arxiv:2402.01761
-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 73 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 22
-
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 20 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 17 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 15 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 22
-
Inferring Functionality of Attention Heads from their Parameters
Paper • 2412.11965 • Published • 2 -
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
Paper • 2412.08686 • Published • 1 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 64 -
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Paper • 2411.14257 • Published • 9
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29