Papers
arxiv:2305.18240

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

Published on May 26, 2023
Authors:
,
,
,

Abstract

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning the most three popular gradient-based optimizers including SGD with momentum, Adam, and AdamW demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the original optimizers when training the DNN models. The code of XGrad will be available at: https://github.com/guanleics/XGrad.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.18240 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.18240 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.18240 in a Space README.md to link it from this page.

Collections including this paper 1