Collections
Discover the best community collections!
Collections including paper arxiv:2403.10704
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 2 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 15 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12
-
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper • 2311.00059 • Published • 17 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 45 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 55
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 16 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 35 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 35 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 19