PEFT documentation

Tuners

You are viewing v0.6.1 version. A newer version v0.10.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Tuners

Each tuner (or PEFT method) has a configuration and model.

LoRA

For finetuning a model with LoRA.

class peft.LoraConfig

< >

( peft_type: typing.Union[str, peft.utils.peft_types.PeftType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: str = None revision: str = None task_type: typing.Union[str, peft.utils.peft_types.TaskType] = None inference_mode: bool = False r: int = 8 target_modules: typing.Union[str, typing.List[str], NoneType] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: str = 'none' modules_to_save: typing.Optional[typing.List[str]] = None init_lora_weights: bool = True layers_to_transform: typing.Union[int, typing.List[int], NoneType] = None layers_pattern: typing.Union[str, typing.List[str], NoneType] = None rank_pattern: typing.Optional[dict] = <factory> alpha_pattern: typing.Optional[dict] = <factory> )

Parameters

  • r (int) — Lora attention dimension.
  • target_modules (Union[List[str],str]) — The names of the modules to apply Lora to.
  • lora_alpha (int) — The alpha parameter for Lora scaling.
  • lora_dropout (float) — The dropout probability for Lora layers.
  • fan_in_fan_out (bool) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses Conv1D which stores weights like (fan_in, fan_out) and hence this should be set to True.
  • bias (str) — Bias type for Lora. Can be ‘none’, ‘all’ or ‘lora_only’. If ‘all’ or ‘lora_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
  • modules_to_save (List[str]) —List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.
  • layers_to_transform (Union[List[int],int]) — The layer indexes to transform, if this argument is specified, it will apply the LoRA transformations on the layer indexes that are specified in this list. If a single integer is passed, it will apply the LoRA transformations on the layer at this index.
  • layers_pattern (str) — The layer pattern name, used only if layers_to_transform is different from None and if the layer pattern is not in the common layers pattern.
  • rank_pattern (dict) — The mapping from layer names or regexp expression to ranks which are different from the default rank specified by r.
  • alpha_pattern (dict) — The mapping from layer names or regexp expression to alphas which are different from the default alpha specified by lora_alpha.

This is the configuration class to store the configuration of a LoraModel.

class peft.LoraModel

< >

( model config adapter_name ) β†’ torch.nn.Module

Parameters

  • model (PreTrainedModel) — The model to be adapted.
  • config (LoraConfig) — The configuration of the Lora model.
  • adapter_name (str) — The name of the adapter, defaults to "default".

Returns

torch.nn.Module

The Lora model.

Creates Low Rank Adapter (Lora) model from a pretrained transformers model.

Example:

>>> from transformers import AutoModelForSeq2SeqLM
>>> from peft import LoraModel, LoraConfig

>>> config = LoraConfig(
...     task_type="SEQ_2_SEQ_LM",
...     r=8,
...     lora_alpha=32,
...     target_modules=["q", "v"],
...     lora_dropout=0.01,
... )

>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> lora_model = LoraModel(model, config, "default")
>>> import transformers
>>> from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_int8_training

>>> target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"]
>>> config = LoraConfig(
...     r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
... )

>>> model = transformers.GPTJForCausalLM.from_pretrained(
...     "kakaobrain/kogpt",
...     revision="KoGPT6B-ryan1.5b-float16",  # or float32 version: revision=KoGPT6B-ryan1.5b
...     pad_token_id=tokenizer.eos_token_id,
...     use_cache=False,
...     device_map={"": rank},
...     torch_dtype=torch.float16,
...     load_in_8bit=True,
... )
>>> model = prepare_model_for_int8_training(model)
>>> lora_model = get_peft_model(model, config)

Attributes:

add_weighted_adapter

< >

( adapters weights adapter_name combination_type = 'svd' svd_rank = None svd_clamp = None svd_full_matrices = True svd_driver = None )

Parameters

  • adapters (list) — List of adapter names to be merged.
  • weights (list) — List of weights for each adapter.
  • adapter_name (str) — Name of the new adapter.
  • combination_type (str) — Type of merging. Can be one of [svd, linear, cat]. When using the cat combination_type you should be aware that rank of the resulting adapter will be equal to the sum of all adapters ranks. So it’s possible that the mixed adapter may become too big and result in OOM errors.
  • svd_rank (int, optional) — Rank of output adapter for svd. If None provided, will use max rank of merging adapters.
  • svd_clamp (float, optional) — A quantile threshold for clamping SVD decomposition output. If None is provided, do not perform clamping. Defaults to None.
  • svd_full_matrices (bool, optional) — Controls whether to compute the full or reduced SVD, and consequently, the shape of the returned tensors U and Vh. Defaults to True.
  • svd_driver (str, optional) — Name of the cuSOLVER method to be used. This keyword argument only works when merging on CUDA. Can be one of [None, gesvd, gesvdj, gesvda]. For more info please refer to torch.linalg.svd documentation. Defaults to None.

This method adds a new adapter by merging the given adapters with the given weights.

When using the cat combination_type you should be aware that rank of the resulting adapter will be equal to the sum of all adapters ranks. So it’s possible that the mixed adapter may become too big and result in OOM errors.

delete_adapter

< >

( adapter_name: str )

Parameters

  • adapter_name (str) — Name of the adapter to be deleted.

Deletes an existing adapter.

merge_and_unload

< >

( progressbar: bool = False safe_merge: bool = False )

Parameters

  • progressbar (bool) — whether to show a progressbar indicating the unload and merge process
  • safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights

This method merges the LoRa layers into the base model. This is needed if someone wants to use the base model as a standalone model.

Example:

>>> from transformers import AutoModelForCausalLM
>>> from peft import PeftModel

>>> base_model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b")
>>> peft_model_id = "smangrul/falcon-40B-int4-peft-lora-sfttrainer-sample"
>>> model = PeftModel.from_pretrained(base_model, peft_model_id)
>>> merged_model = model.merge_and_unload()

unload

< >

( )

Gets back the base model by removing all the lora modules without merging. This gives back the original base model.

class peft.tuners.lora.LoraLayer

< >

( in_features: int out_features: int **kwargs )

class peft.tuners.lora.Linear

< >

( adapter_name: str in_features: int out_features: int r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False **kwargs )

get_delta_weight

< >

( adapter )

Parameters

  • adapter (str) — The name of the adapter for which the delta weight should be computed.

Compute the delta weight for the given adapter.

merge

< >

( safe_merge: bool = False )

Parameters

  • safe_merge (bool, optional) — If True, the merge operation will be performed in a copy of the original weights and check for NaNs before merging the weights. This is useful if you want to check if the merge operation will produce NaNs. Defaults to False.

Merge the active adapter weights into the base weights

P-tuning

class peft.PromptEncoderConfig

< >

( peft_type: typing.Union[str, peft.utils.peft_types.PeftType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: str = None revision: str = None task_type: typing.Union[str, peft.utils.peft_types.TaskType] = None inference_mode: bool = False num_virtual_tokens: int = None token_dim: int = None num_transformer_submodules: typing.Optional[int] = None num_attention_heads: typing.Optional[int] = None num_layers: typing.Optional[int] = None encoder_reparameterization_type: typing.Union[str, peft.tuners.p_tuning.config.PromptEncoderReparameterizationType] = <PromptEncoderReparameterizationType.MLP: 'MLP'> encoder_hidden_size: int = None encoder_num_layers: int = 2 encoder_dropout: float = 0.0 )

Parameters

  • encoder_reparameterization_type (Union[PromptEncoderReparameterizationType, str]) — The type of reparameterization to use.
  • encoder_hidden_size (int) — The hidden size of the prompt encoder.
  • encoder_num_layers (int) — The number of layers of the prompt encoder.
  • encoder_dropout (float) — The dropout probability of the prompt encoder.

This is the configuration class to store the configuration of a PromptEncoder.

class peft.PromptEncoder

< >

( config )

Parameters

The prompt encoder network that is used to generate the virtual token embeddings for p-tuning.

Example:

>>> from peft import PromptEncoder, PromptEncoderConfig

>>> config = PromptEncoderConfig(
...     peft_type="P_TUNING",
...     task_type="SEQ_2_SEQ_LM",
...     num_virtual_tokens=20,
...     token_dim=768,
...     num_transformer_submodules=1,
...     num_attention_heads=12,
...     num_layers=12,
...     encoder_reparameterization_type="MLP",
...     encoder_hidden_size=768,
... )

>>> prompt_encoder = PromptEncoder(config)

Attributes:

  • embedding (torch.nn.Embedding) β€” The embedding layer of the prompt encoder.
  • mlp_head (torch.nn.Sequential) β€” The MLP head of the prompt encoder if inference_mode=False.
  • lstm_head (torch.nn.LSTM) β€” The LSTM head of the prompt encoder if inference_mode=False and encoder_reparameterization_type="LSTM".
  • token_dim (int) β€” The hidden embedding dimension of the base transformer model.
  • input_size (int) β€” The input size of the prompt encoder.
  • output_size (int) β€” The output size of the prompt encoder.
  • hidden_size (int) β€” The hidden size of the prompt encoder.
  • total_virtual_tokens (int): The total number of virtual tokens of the prompt encoder.
  • encoder_type (Union[PromptEncoderReparameterizationType, str]): The encoder type of the prompt encoder.

Input shape: (batch_size, total_virtual_tokens)

Output shape: (batch_size, total_virtual_tokens, token_dim)

Prefix tuning

class peft.PrefixTuningConfig

< >

( peft_type: typing.Union[str, peft.utils.peft_types.PeftType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: str = None revision: str = None task_type: typing.Union[str, peft.utils.peft_types.TaskType] = None inference_mode: bool = False num_virtual_tokens: int = None token_dim: int = None num_transformer_submodules: typing.Optional[int] = None num_attention_heads: typing.Optional[int] = None num_layers: typing.Optional[int] = None encoder_hidden_size: int = None prefix_projection: bool = False )

Parameters

  • encoder_hidden_size (int) — The hidden size of the prompt encoder.
  • prefix_projection (bool) — Whether to project the prefix embeddings.

This is the configuration class to store the configuration of a PrefixEncoder.

class peft.PrefixEncoder

< >

( config )

Parameters

The torch.nn model to encode the prefix.

Example:

>>> from peft import PrefixEncoder, PrefixTuningConfig

>>> config = PrefixTuningConfig(
...     peft_type="PREFIX_TUNING",
...     task_type="SEQ_2_SEQ_LM",
...     num_virtual_tokens=20,
...     token_dim=768,
...     num_transformer_submodules=1,
...     num_attention_heads=12,
...     num_layers=12,
...     encoder_hidden_size=768,
... )
>>> prefix_encoder = PrefixEncoder(config)

Attributes:

  • embedding (torch.nn.Embedding) β€” The embedding layer of the prefix encoder.
  • transform (torch.nn.Sequential) β€” The two-layer MLP to transform the prefix embeddings if prefix_projection is True.
  • prefix_projection (bool) β€” Whether to project the prefix embeddings.

Input shape: (batch_size, num_virtual_tokens)

Output shape: (batch_size, num_virtual_tokens, 2*layers*hidden)

Prompt tuning

class peft.PromptTuningConfig

< >

( peft_type: typing.Union[str, peft.utils.peft_types.PeftType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: str = None revision: str = None task_type: typing.Union[str, peft.utils.peft_types.TaskType] = None inference_mode: bool = False num_virtual_tokens: int = None token_dim: int = None num_transformer_submodules: typing.Optional[int] = None num_attention_heads: typing.Optional[int] = None num_layers: typing.Optional[int] = None prompt_tuning_init: typing.Union[peft.tuners.prompt_tuning.config.PromptTuningInit, str] = <PromptTuningInit.RANDOM: 'RANDOM'> prompt_tuning_init_text: typing.Optional[str] = None tokenizer_name_or_path: typing.Optional[str] = None )

Parameters

  • prompt_tuning_init (Union[PromptTuningInit, str]) — The initialization of the prompt embedding.
  • prompt_tuning_init_text (str, optional) — The text to initialize the prompt embedding. Only used if prompt_tuning_init is TEXT.
  • tokenizer_name_or_path (str, optional) — The name or path of the tokenizer. Only used if prompt_tuning_init is TEXT.

This is the configuration class to store the configuration of a PromptEmbedding.

class peft.PromptEmbedding

< >

( config word_embeddings )

Parameters

  • config (PromptTuningConfig) — The configuration of the prompt embedding.
  • word_embeddings (torch.nn.Module) — The word embeddings of the base transformer model.

The model to encode virtual tokens into prompt embeddings.

Attributes:

  • embedding (torch.nn.Embedding) β€” The embedding layer of the prompt embedding.

Example:

>>> from peft import PromptEmbedding, PromptTuningConfig

>>> config = PromptTuningConfig(
...     peft_type="PROMPT_TUNING",
...     task_type="SEQ_2_SEQ_LM",
...     num_virtual_tokens=20,
...     token_dim=768,
...     num_transformer_submodules=1,
...     num_attention_heads=12,
...     num_layers=12,
...     prompt_tuning_init="TEXT",
...     prompt_tuning_init_text="Predict if sentiment of this review is positive, negative or neutral",
...     tokenizer_name_or_path="t5-base",
... )

>>> # t5_model.shared is the word embeddings of the base model
>>> prompt_embedding = PromptEmbedding(config, t5_model.shared)

Input Shape: (batch_size, total_virtual_tokens)

Output Shape: (batch_size, total_virtual_tokens, token_dim)

IA3

class peft.IA3Config

< >

( peft_type: typing.Union[str, peft.utils.peft_types.PeftType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: str = None revision: str = None task_type: typing.Union[str, peft.utils.peft_types.TaskType] = None inference_mode: bool = False target_modules: typing.Union[str, typing.List[str], NoneType] = None feedforward_modules: typing.Union[str, typing.List[str], NoneType] = None fan_in_fan_out: bool = False modules_to_save: typing.Optional[typing.List[str]] = None init_ia3_weights: bool = True )

Parameters

  • target_modules (Union[List[str],str]) — The names of the modules to apply (IA)^3 to.
  • feedforward_modules (Union[List[str],str]) — The names of the modules to be treated as feedforward modules, as in the original paper. These modules will have (IA)^3 vectors multiplied to the input, instead of the output. feedforward_modules must be a name or a subset of names present in target_modules.
  • fan_in_fan_out (bool) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses Conv1D which stores weights like (fan_in, fan_out) and hence this should be set to True.
  • modules_to_save (List[str]) — List of modules apart from (IA)^3 layers to be set as trainable and saved in the final checkpoint.
  • init_ia3_weights (bool) — Whether to initialize the vectors in the (IA)^3 layers, defaults to True.

This is the configuration class to store the configuration of a IA3Model.

class peft.IA3Model

< >

( model config adapter_name ) β†’ torch.nn.Module

Parameters

  • model (PreTrainedModel) — The model to be adapted.
  • config (IA3Config) — The configuration of the (IA)^3 model.
  • adapter_name (str) — The name of the adapter, defaults to "default".

Returns

torch.nn.Module

The (IA)^3 model.

Creates a Infused Adapter by Inhibiting and Amplifying Inner Activations ((IA)^3) model from a pretrained transformers model. The method is described in detail in https://arxiv.org/abs/2205.05638

Example:

>>> from transformers import AutoModelForSeq2SeqLM, ia3Config
>>> from peft import IA3Model, IA3Config

>>> config = IA3Config(
...     peft_type="IA3",
...     task_type="SEQ_2_SEQ_LM",
...     target_modules=["k", "v", "w0"],
...     feedforward_modules=["w0"],
... )

>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> ia3_model = IA3Model(config, model)

Attributes:

  • model (PreTrainedModel) β€” The model to be adapted.
  • peft_config (ia3Config): The configuration of the (IA)^3 model.

merge_and_unload

< >

( safe_merge: bool = False )

Parameters

  • safe_merge (bool, optional, defaults to False) — If True, the merge operation will be performed in a copy of the original weights and check for NaNs before merging the weights. This is useful if you want to check if the merge operation will produce NaNs. Defaults to False.

This method merges the (IA)^3 layers into the base model. This is needed if someone wants to use the base model as a standalone model.