Diffusers documentation

LoRA

You are viewing v0.24.0 version. A newer version v0.32.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

LoRA

This is experimental and the API may change in the future.

LoRA (Low-Rank Adaptation of Large Language Models) is a popular and lightweight training technique that significantly reduces the number of trainable parameters. It works by inserting a smaller number of new weights into the model and only these are trained. This makes training with LoRA much faster, memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share. LoRA can also be combined with other training techniques like DreamBooth to speedup training.

LoRA is very versatile and supported for DreamBooth, Kandinsky 2.2, Stable Diffusion XL, text-to-image, and Wuerstchen.

This guide will explore the train_text_to_image_lora.py script to help you become more familiar with it, and how you can adapt it for your own use-case.

Before running the script, make sure you install the library from source:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Navigate to the example folder with the training script and install the required dependencies for the script you’re using:

PyTorch
Flax
cd examples/text_to_image
pip install -r requirements.txt

🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It’ll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate Quick tour to learn more.

Initialize an 🤗 Accelerate environment:

accelerate config

To setup a default 🤗 Accelerate environment without choosing any configurations:

accelerate config default

Or if your environment doesn’t support an interactive shell, like a notebook, you can use:

from accelerate.utils import write_basic_config

write_basic_config()

Lastly, if you want to train a model on your own dataset, take a look at the Create a dataset for training guide to learn how to create a dataset that works with the training script.

The following sections highlight parts of the training script that are important for understanding how to modify it, but it doesn’t cover every aspect of the script in detail. If you’re interested in learning more, feel free to read through the script and let us know if you have any questions or concerns.

Script parameters

The training script has many parameters to help you customize your training run. All of the parameters and their descriptions are found in the parse_args() function. Default values are provided for most parameters that work pretty well, but you can also set your own values in the training command if you’d like.

For example, to increase the number of epochs to train:

accelerate launch train_text_to_image_lora.py \
  --num_train_epochs=150 \

Many of the basic and important parameters are described in the Text-to-image training guide, so this guide just focuses on the LoRA relevant parameters:

  • --rank: the number of low-rank matrices to train
  • --learning_rate: the default learning rate is 1e-4, but with LoRA, you can use a higher learning rate

Training script

The dataset preprocessing code and training loop are found in the main() function, and if you need to adapt the training script, this is where you’ll make your changes.

As with the script parameters, a walkthrough of the training script is provided in the Text-to-image training guide. Instead, this guide takes a look at the LoRA relevant parts of the script.

The script begins by adding the new LoRA weights to the attention layers. This involves correctly configuring the weight size for each block in the UNet. You’ll see the rank parameter is used to create the LoRAAttnProcessor:

lora_attn_procs = {}
for name in unet.attn_processors.keys():
    cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
    if name.startswith("mid_block"):
        hidden_size = unet.config.block_out_channels[-1]
    elif name.startswith("up_blocks"):
        block_id = int(name[len("up_blocks.")])
        hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
    elif name.startswith("down_blocks"):
        block_id = int(name[len("down_blocks.")])
        hidden_size = unet.config.block_out_channels[block_id]

    lora_attn_procs[name] = LoRAAttnProcessor(
        hidden_size=hidden_size,
        cross_attention_dim=cross_attention_dim,
        rank=args.rank,
    )

unet.set_attn_processor(lora_attn_procs)
lora_layers = AttnProcsLayers(unet.attn_processors)

The optimizer is initialized with the lora_layers because these are the only weights that’ll be optimized:

optimizer = optimizer_cls(
    lora_layers.parameters(),
    lr=args.learning_rate,
    betas=(args.adam_beta1, args.adam_beta2),
    weight_decay=args.adam_weight_decay,
    eps=args.adam_epsilon,
)

Aside from setting up the LoRA layers, the training script is more or less the same as train_text_to_image.py!

Launch the script

Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀

Let’s train on the Pokémon BLIP captions dataset to generate our yown Pokémon. Set the environment variables MODEL_NAME and DATASET_NAME to the model and dataset respectively. You should also specify where to save the model in OUTPUT_DIR, and the name of the model to save to on the Hub with HUB_MODEL_ID. The script creates and saves the following files to your repository:

  • saved model checkpoints
  • pytorch_lora_weights.safetensors (the trained LoRA weights)

If you’re training on more than one GPU, add the --multi_gpu parameter to the accelerate launch command.

A full training run takes ~5 hours on a 2080 Ti GPU with 11GB of VRAM.

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/sddata/finetune/lora/pokemon"
export HUB_MODEL_ID="pokemon-lora"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16"  train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --dataloader_num_workers=8 \
  --resolution=512 
  --center_crop \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" \
  --lr_warmup_steps=0 \
  --output_dir=${OUTPUT_DIR} \
  --push_to_hub \
  --hub_model_id=${HUB_MODEL_ID} \
  --report_to=wandb \
  --checkpointing_steps=500 \
  --validation_prompt="A pokemon with blue eyes." \
  --seed=1337

Once training has been completed, you can use your model for inference:

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors")
image = pipeline("A pokemon with blue eyes").images[0]

Next steps

Congratulations on training a new model with LoRA! To learn more about how to use your new model, the following guides may be helpful: