PEFT documentation

AdaLoRA

You are viewing v0.8.2 version. A newer version v0.13.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

AdaLoRA

AdaLoRA is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters.

The abstract from the paper is:

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA.

AdaLoraConfig

class peft.AdaLoraConfig

< >

( peft_type: Union = None auto_mapping: Optional = None base_model_name_or_path: Optional = None revision: Optional = None task_type: Union = None inference_mode: bool = False r: int = 8 target_modules: Optional[Union[list[str], str]] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: Literal['none', 'all', 'lora_only'] = 'none' use_rslora: bool = False modules_to_save: Optional[list[str]] = None init_lora_weights: bool | Literal['gaussian', 'loftq'] = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None rank_pattern: Optional = None alpha_pattern: Optional[dict] = <factory> megatron_config: Optional[dict] = None megatron_core: Optional[str] = 'megatron.core' loftq_config: Union[LoftQConfig, dict] = <factory> use_dora: bool = False target_r: int = 8 init_r: int = 12 tinit: int = 0 tfinal: int = 0 deltaT: int = 1 beta1: float = 0.85 beta2: float = 0.85 orth_reg_weight: float = 0.5 total_step: Optional = None )

Parameters

  • target_r (int) — The target average rank of incremental matrix.
  • init_r (int) — The initial rank for each incremental matrix.
  • tinit (int) — The steps of initial fine-tuning warmup.
  • tfinal (int) — The step of final fine-tuning.
  • deltaT (int) — The time internval between two budget allocations.
  • beta1 (float) — The hyperparameter of EMA for sensitivity smoothing.
  • beta2 (float) — The hyperparameter of EMA for undertainty quantification.
  • orth_reg_weight (float) — The coefficient of orthogonal regularization.
  • total_step (int) — The total training steps that should be specified before training.
  • rank_pattern (list) — The allocated rank for each weight matrix by RankAllocator.

This is the configuration class to store the configuration of a ~peft.AdaLora.

AdaLoraModel

class peft.AdaLoraModel

< >

( model config adapter_name ) torch.nn.Module

Parameters

  • model ([transformers.PreTrainedModel]) — The model to be adapted.
  • config ([AdaLoraConfig]) — The configuration of the AdaLora model.
  • adapter_name (str) — The name of the adapter, defaults to “default”.

Returns

torch.nn.Module

The AdaLora model.

Creates AdaLoRA (Adaptive LoRA) model from a pretrained transformers model. Paper: https://openreview.net/forum?id=lq62uWRJjiY

Example:

>>> from transformers import AutoModelForSeq2SeqLM, LoraConfig >>> from peft import AdaLoraModel, AdaLoraConfig
>>> config = AdaLoraConfig(
peft_type="ADALORA", task_type="SEQ_2_SEQ_LM", r=8, lora_alpha=32, target_modules=["q", "v"],
lora_dropout=0.01,
)
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") >>> model = AdaLoraModel(model, config, "default")

Attributes:

  • model ([transformers.PreTrainedModel]) — The model to be adapted.
  • peft_config ([AdaLoraConfig]): The configuration of the AdaLora model.

update_and_allocate

< >

( global_step )

Parameters

  • global_step (int) — The current training step, it is used to calculate adalora budget.

This method updates Adalora budget and mask.

This should be called in every training step after loss.backward() and before zero_grad().

tinit, tfinal and deltaT are handled with in the method.

Example:

>>> loss = model(**input).loss
>>> loss.backward()
>>> optimizer.step()
>>> model.base_model.update_and_allocate(i_step)
>>> optimizer.zero_grad()