Trainable Tokens

The Trainable Tokens method provides a way to target specific token embeddings for fine-tuning without resorting to training the full embedding matrix or using an adapter on the embedding matrix. It is based on the initial implementation from here.

The method only targets specific tokens and selectively trains the token indices you specify. Consequently the required RAM will be lower and disk memory is also significantly lower than storing the full fine-tuned embedding matrix.

Some preliminary benchmarks acquired with this script suggest that for gemma-2-2b (which has a rather large embedding matrix) you can save ~4 GiB VRAM with Trainable Tokens over fully fine-tuning the embedding matrix. While LoRA will use comparable amounts of VRAM it might also target tokens you don’t want to be changed. Note that these are just indications and varying embedding matrix sizes might skew these numbers a bit.

Note that this method does not add tokens for you, you have to add tokens to the tokenizer yourself and resize the embedding matrix of the model accordingly. This method will only re-train the embeddings for the tokens you specify. This method can also be used in conjunction with LoRA layers! See the LoRA developer guide.

TrainableTokensConfig

class peft.TrainableTokensConfig

< source >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False token_indices: list[int] = <factory> target_modules: Optional[Union[list[str], str]] = None init_weights: bool = True )

Parameters

token_indices (list[int]) — List of integers, signifying the indices of the tokens you want to be trainable. To find the index of a token with a tokenizer, you can tokenize the string and look at the returned input_ids. The closer the amount of indices is to the total amount of tokens, the less efficient this method gets.
target_modules (Optional[Union[list[str], str]]) — List of module names or regex expression of the module names to replace with our TrainableTokensLayer. If not defined, it will attempt to get the model’s input embedding layer if the model has a get_input_embeddings method (transformer models usually do), if that fails the default is ‘embed_tokens’. Other example targets are embedding, encoder.embeddings or decoder.embeddings.
init_weights (bool) — By default the new token weights are initialized to be the same as the respective token embeddings. This makes TrainableTokens a no-op when not trained. If set to False the weights will be random values. Do not change this setting unless you know exactly what you’re doing.

Configuration for the TrainableTokens method.

Allows for training new tokens (and re-training existing ones) without training the full embedding matrix. By marking a few select tokens (identified by their indices) trainable and leaving the rest untouched, this method can be used to add new tokens or changing the embedding of existing tokens while saving on memory. Both storage as well as working memory usage are reduced in contrast to training the embedding matrix fully.

Note that training with FSDP/DeepSpeed might not yet be fully supported.

TrainableTokensModel

class peft.TrainableTokensModel

< source >

( model config adapter_name low_cpu_mem_usage: bool = False )

disable_adapter_layers

< source >

( )

Disable all adapters.

When disabling all adapters, the model output corresponds to the output of the base model.

enable_adapter_layers

< source >

( )

Enable all adapters.

Call this if you have previously disabled all adapters and want to re-enable them.

merge_and_unload

< source >

( progressbar: bool = False safe_merge: bool = False adapter_names: Optional[list[str]] = None )

Parameters

progressbar (bool) — whether to show a progressbar indicating the unload and merge process
safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights
adapter_names (List[str], optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults to None.

This method merges the trained tokens into the targeted embedding layer(s) of the base model. This is needed if someone wants to use the base model as a standalone model.

set_adapter

< source >

( adapter_name: str | list[str] )

Parameters

adapter_name (str or list[str]) — Name of the adapter(s) to be activated.

Set the active adapter(s).

Additionally, this function will set the specified adapters to trainable (i.e., requires_grad=True). If this is not desired, use the following code.

>>> for name, param in model_peft.named_parameters():
...     if ...:  # some check on name (ex. if 'lora' in name)
...         param.requires_grad = False

unload

< source >

( )

Gets back the base model by removing all the trainable tokens modules without merging.

< > Update on GitHub

PEFT

Trainable Tokens

TrainableTokensConfig

class peft.TrainableTokensConfig

TrainableTokensModel

class peft.TrainableTokensModel

disable_adapter_layers

enable_adapter_layers

merge_and_unload

set_adapter

unload