arxiv:2606.13867

Muon^p: Muon with Fractional Spectral Powers

Published on Jun 11

Authors:

Abstract

Muon$^p$ is a novel optimizer that uses fractional spectral-power updates to balance between gradient descent and full singular spectrum flattening, enabling efficient fine-tuning of large-scale models while maintaining theoretical guarantees and practical computation efficiency.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Muon is an increasingly widely used optimizer that replaces a gradient G=USV^top with its polar factor UV^top, thereby flattening the singular spectrum. However, full flattening discards singular-value information that may matter for adaptation. We introduce Muon^p, a Muon-style optimizer that instead uses fractional spectral-power updates US^pV^top for rational pin(0,1), interpolating between Muon and gradient descent. To make it practical, we prove that fractional spectral powers cannot be computed by any fixed univariate polynomial iteration, and furthermore derive low-degree odd bivariate recurrences that approximate US^pV^top using only matrix multiplications, preserving Muon's matrix-multiplication-only structure and compute complexity. We show that Muon^p maximizes the linear improvement in loss under the Schatten q-norm for q=1+1{p}. Empirically, Muon^p is especially effective for finetuning: on billion-scale models, Muon^p improves validation perplexity and downstream task performance. We further analyze when Muon^p is less suitable, through the lens of spectral geometry. Our results reveal important insights on when preserving the singular spectrum can bring significant gains, and introduce a principled way to achieve them.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.13867

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.13867 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.13867 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.13867 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.