BOFT

Orthogonal Butterfly (BOFT) is a generic method designed for finetuning foundation models. It improves the parameter efficiency of the finetuning paradigm — Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.

The abstract from the paper is:

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm — Orthogonal Finetuning (OFT) — for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

BOFTConfig

class peft.BOFTConfig

< source >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False boft_block_size: int = 4 boft_block_num: int = 0 boft_n_butterfly_factor: int = 1 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None boft_dropout: float = 0.0 fan_in_fan_out: bool = False bias: str = 'none' modules_to_save: Optional[list[str]] = None init_weights: bool = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None )

Parameters

boft_block_size (int) — BOFT block size across different layers.
boft_block_num (int) — Number of BOFT blocks per injected layer.
boft_n_butterfly_factor (int) — Number of butterfly factors across different layers.
target_modules (Union[List[str],str]) — The names of the modules to apply the adapter to.
exclude_modules (Optional[Union[List[str], str]]) — The names of the modules to not apply the adapter. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings.
boft_dropout (float) — The multiplicative dropout probability, by setting OFT blocks to identity during training, similar to the dropout layer in LoRA.
fan_in_fan_out (bool) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses Conv1D which stores weights like (fan_in, fan_out) and hence this should be set to True.
bias (str) — Bias type for BOFT. Can be ‘none’, ‘all’ or ‘boft_only’. If ‘all’ or ‘boft_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
modules_to_save (List[str]) —List of modules apart from BOFT layers to be set as trainable and saved in the final checkpoint.
layers_to_transform (Union[List[int],int]) — The layer indexes to transform, if this argument is specified, it will apply the BOFT transformations on the layer indexes that are specified in this list. If a single integer is passed, it will apply the BOFT transformations on the layer at this index.
layers_pattern (Optional[Union[List[str], str]]) — The layer pattern name, used only if layers_to_transform is different from None and if the layer pattern is not in the common layers pattern. This should target the nn.ModuleList of the model, which is often called 'layers' or 'h'.

This is the configuration class to store the configuration of a BOFTModel.

BOFTModel

class peft.BOFTModel

< source >

( model peft_config: Union[PeftConfig, dict[str, PeftConfig]] adapter_name: str low_cpu_mem_usage: bool = False state_dict: Optional[dict[str, torch.Tensor]] = None ) → torch.nn.Module

Parameters

model ([transformers.PreTrainedModel]) — The model to be adapted.
config ([BOFTConfig]) — The configuration of the BOFT model.
adapter_name (str) — The name of the adapter, defaults to “default”.
low_cpu_mem_usage (bool, optional, defaults to False) — Create empty adapter weights on meta device. Useful to speed up the loading process.

Returns

torch.nn.Module

The BOFT model.

Creates BOFT and OFT model from a pretrained transformers model. Paper: https://huggingface.co/papers/2311.06243 https://huggingface.co/papers/2306.07280

Example:

>>> import transformers >>> from transformers import AutoModelForSeq2SeqLM, BOFTConfig >>> from peft import
BOFTConfig, get_peft_model

>>> config = BOFTConfig( ... boft_block_size=8, ... boft_n_butterfly_factor=1, ... target_modules=["query",
"value", "key", "output.dense", "mlp.fc1", "mlp.fc2"], ... boft_dropout=0.1, ... bias="boft_only", ...
modules_to_save=["classifier"], ... )

>>> model = transformers.Dinov2ForImageClassification.from_pretrained( ... "facebook/dinov2-large", ...
num_labels=100, ... ) >>> boft_model = get_peft_model(model, config)

Attributes:

model ([transformers.PreTrainedModel]) — The model to be adapted.
peft_config ([BOFTConfig]): The configuration of the BOFT model.

delete_adapter

< source >

( adapter_name: str )

Parameters

adapter_name (str) — Name of the adapter to be deleted.

Deletes an existing adapter.

merge_and_unload

< source >

( progressbar: bool = False safe_merge: bool = False adapter_names: typing.Optional[list[str]] = None )

Parameters

progressbar (bool) — whether to show a progressbar indicating the unload and merge process
safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights
adapter_names (List[str], optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults to None.

This method merges the BOFT layers into the base model. This is needed if someone wants to use the base model as a standalone model.

unload

< source >

( )

Gets back the base model by removing all the boft modules without merging. This gives back the original base model.

< > Update on GitHub

PEFT

BOFT

BOFTConfig

class peft.BOFTConfig

BOFTModel

class peft.BOFTModel

delete_adapter

merge_and_unload

unload