Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise Paper • 2312.14567 • Published Dec 22, 2023 • 1
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning Paper • 2403.17919 • Published Mar 26 • 15