Papers
arxiv:1602.07868

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Published on Feb 25, 2016
Authors:
,

Abstract

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.

Community

Rotation contains much more entropy than scaling, and it is much more friendly to combination, and less-prune to explosion / vanishing values. Rotating the neural network parameters just seems much more important than scaling them. Eventually people might just converge towards binary paramter, where you do not need to scale anything.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1602.07868 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1602.07868 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1602.07868 in a Space README.md to link it from this page.

Collections including this paper 1