Papers
arxiv:2402.09353

DoRA: Weight-Decomposed Low-Rank Adaptation

Published on Feb 14
Authors:
,
,
,
,
,

Abstract

Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.

Community

Paper author
This comment has been hidden
Paper author

DoRA is now supported by HuggingFace PEFT! See https://github.com/huggingface/peft/releases/tag/v0.9.0 for more details.

Paper author

Checkout the official repo of DoRA for more details: https://github.com/nbasyl/DoRA

Rotation contains much more entropy than scaling, and it is much more friendly to combination, and less-prune to explosion / vanishing values. Rotating the neural network parameters just seems much more important than scaling them. Eventually people might just converge towards binary paramter, where you do not need to scale anything.

Paper author

Official DoRA code: https://github.com/NVlabs/DoRA

Sign up or log in to comment

Models citing this paper 8

Browse 8 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.09353 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.09353 in a Space README.md to link it from this page.

Collections including this paper 14