Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published 4 days ago • 4 • 2