-
Can LLMs Follow Simple Rules?
Paper • 2311.04235 • Published • 9 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 75 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 177 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03507
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 16 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 35 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 35 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 19