PEFT documentation

AdaLoRA

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.14.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

AdaLoRA

AdaLoRA is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters.

The abstract from the paper is:

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA.

AdaLoraConfig

class peft.AdaLoraConfig

< >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False r: int = 8 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: Literal['none', 'all', 'lora_only'] = 'none' use_rslora: bool = False modules_to_save: Optional[list[str]] = None init_lora_weights: bool | Literal['gaussian', 'eva', 'olora', 'pissa', 'pissa_niter_[number of iters]', 'corda', 'loftq'] = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None rank_pattern: typing.Optional[dict] = None alpha_pattern: Optional[dict] = <factory> megatron_config: Optional[dict] = None megatron_core: Optional[str] = 'megatron.core' loftq_config: Union[LoftQConfig, dict] = <factory> eva_config: Optional[EvaConfig] = None corda_config: Optional[CordaConfig] = None use_dora: bool = False layer_replication: Optional[list[tuple[int, int]]] = None runtime_config: LoraRuntimeConfig = <factory> lora_bias: bool = False target_r: int = 8 init_r: int = 12 tinit: int = 0 tfinal: int = 0 deltaT: int = 1 beta1: float = 0.85 beta2: float = 0.85 orth_reg_weight: float = 0.5 total_step: typing.Optional[int] = None )

Parameters

  • target_r (int) — The target average rank of incremental matrix.
  • init_r (int) — The initial rank for each incremental matrix.
  • tinit (int) — The steps of initial fine-tuning warmup.
  • tfinal (int) — The step of final fine-tuning.
  • deltaT (int) — The time internval between two budget allocations.
  • beta1 (float) — The hyperparameter of EMA for sensitivity smoothing.
  • beta2 (float) — The hyperparameter of EMA for undertainty quantification.
  • orth_reg_weight (float) — The coefficient of orthogonal regularization.
  • total_step (int) — The total training steps that should be specified before training.
  • rank_pattern (list) — The allocated rank for each weight matrix by RankAllocator.

This is the configuration class to store the configuration of a ~peft.AdaLora.

AdaLoraModel

class peft.AdaLoraModel

< >

( model config adapter_name ) torch.nn.Module

Parameters

  • model ([transformers.PreTrainedModel]) — The model to be adapted.
  • config ([AdaLoraConfig]) — The configuration of the AdaLora model.
  • adapter_name (str) — The name of the adapter, defaults to “default”.
  • low_cpu_mem_usage (bool, optional, defaults to False) — Create empty adapter weights on meta device. Useful to speed up the loading process.

Returns

torch.nn.Module

The AdaLora model.

Creates AdaLoRA (Adaptive LoRA) model from a pretrained transformers model. Paper: https://openreview.net/forum?id=lq62uWRJjiY

Example:

>>> from transformers import AutoModelForSeq2SeqLM >>> from peft import LoraConfig, AdaLoraModel, AdaLoraConfig
>>> config = AdaLoraConfig(
peft_type="ADALORA", task_type="SEQ_2_SEQ_LM", init_r=12, lora_alpha=32, target_modules=["q", "v"],
lora_dropout=0.01,
)
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") >>> model = AdaLoraModel(model, config, "default")

Attributes:

  • model ([transformers.PreTrainedModel]) — The model to be adapted.
  • peft_config ([AdaLoraConfig]): The configuration of the AdaLora model.

add_weighted_adapter

< >

( *args **kwargs )

This method is not supported for AdaLoRA, use LoRA instead.

update_and_allocate

< >

( global_step )

Parameters

  • global_step (int) — The current training step, it is used to calculate adalora budget.

This method updates Adalora budget and mask.

This should be called in every training step after loss.backward() and before zero_grad().

tinit, tfinal and deltaT are handled with in the method.

Example:

>>> loss = model(**input).loss
>>> loss.backward()
>>> optimizer.step()
>>> model.base_model.update_and_allocate(i_step)
>>> optimizer.zero_grad()
< > Update on GitHub