The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published Jan 31 • 7
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published Jan 31 • 7 • 3
SGD with Clipping is Secretly Estimating the Median Gradient Paper • 2402.12828 • Published Feb 20, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024