Diffusers documentation

Load different Stable Diffusion formats

You are viewing v0.18.2 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Load different Stable Diffusion formats

Stable Diffusion models are available in different formats depending on the framework they’re trained and saved with, and where you download them from. Converting these formats for use in 🤗 Diffusers allows you to use all the features supported by the library, such as using different schedulers for inference, building your custom pipeline, and a variety of techniques and methods for optimizing inference speed.

We highly recommend using the .safetensors format because it is more secure than traditional pickled files which are vulnerable and can be exploited to execute any code on your machine (learn more in the Load safetensors guide).

This guide will show you how to convert other Stable Diffusion formats to be compatible with 🤗 Diffusers.

PyTorch .ckpt

The checkpoint - or .ckpt - format is commonly used to store and save models. The .ckpt file contains the entire model and is typically several GBs in size. While you can load and use a .ckpt file directly with the from_single_file() method, it is generally better to convert the .ckpt file to 🤗 Diffusers so both formats are available.

There are two options for converting a .ckpt file; use a Space to convert the checkpoint or convert the .ckpt file with a script.

Convert with a Space

The easiest and most convenient way to convert a .ckpt file is to use the SD to Diffusers Space. You can follow the instructions on the Space to convert the .ckpt file.

This approach works well for basic models, but it may struggle with more customized models. You’ll know the Space failed if it returns an empty pull request or error. In this case, you can try converting the .ckpt file with a script.

Convert with a script

🤗 Diffusers provides a conversion script for converting .ckpt files. This approach is more reliable than the Space above.

Before you start, make sure you have a local clone of 🤗 Diffusers to run the script and log in to your Hugging Face account so you can open pull requests and push your converted model to the Hub.

huggingface-cli login

To use the script:

  1. Git clone the repository containing the .ckpt file you want to convert. For this example, let’s convert this TemporalNet .ckpt file:
git lfs install
git clone https://huggingface.co/CiaraRowles/TemporalNet
  1. Open a pull request on the repository where you’re converting the checkpoint from:
cd TemporalNet && git fetch origin refs/pr/13:pr/13
git checkout pr/13
  1. There are several input arguments to configure in the conversion script, but the most important ones are:

    • checkpoint_path: the path to the .ckpt file to convert.

    • original_config_file: a YAML file defining the configuration of the original architecture. If you can’t find this file, try searching for the YAML file in the GitHub repository where you found the .ckpt file.

    • dump_path: the path to the converted model.

      For example, you can take the cldm_v15.yaml file from the ControlNet repository because the TemporalNet model is a Stable Diffusion v1.5 and ControlNet model.

  2. Now you can run the script to convert the .ckpt file:

python ../diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path temporalnetv3.ckpt --original_config_file cldm_v15.yaml --dump_path ./ --controlnet
  1. Once the conversion is done, upload your converted model and test out the resulting pull request!
git push origin pr/13:refs/pr/13

Keras .pb or .h5

🧪 This is an experimental feature. Only Stable Diffusion v1 checkpoints are supported by the Convert KerasCV Space at the moment.

KerasCV supports training for Stable Diffusion v1 and v2. However, it offers limited support for experimenting with Stable Diffusion models for inference and deployment whereas 🤗 Diffusers has a more complete set of features for this purpose, such as different noise schedulers, flash attention, and other optimization techniques.

The Convert KerasCV Space converts .pb or .h5 files to PyTorch, and then wraps them in a StableDiffusionPipeline so it is ready for inference. The converted checkpoint is stored in a repository on the Hugging Face Hub.

For this example, let’s convert the sayakpaul/textual-inversion-kerasio checkpoint which was trained with Textual Inversion. It uses the special token <my-funny-cat> to personalize images with cats.

The Convert KerasCV Space allows you to input the following:

  • Your Hugging Face token.
  • Paths to download the UNet and text encoder weights from. Depending on how the model was trained, you don’t necessarily need to provide the paths to both the UNet and text encoder. For example, Textual Inversion only requires the embeddings from the text encoder and a text-to-image model only requires the UNet weights.
  • Placeholder token is only applicable for textual inversion models.
  • The output_repo_prefix is the name of the repository where the converted model is stored.

Click the Submit button to automatically convert the KerasCV checkpoint! Once the checkpoint is successfully converted, you’ll see a link to the new repository containing the converted checkpoint. Follow the link to the new repository, and you’ll see the Convert KerasCV Space generated a model card with an inference widget to try out the converted model.

If you prefer to run inference with code, click on the Use in Diffusers button in the upper right corner of the model card to copy and paste the code snippet:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")

Then you can generate an image like:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
pipeline.to("cuda")

placeholder_token = "<my-funny-cat-token>"
prompt = f"two {placeholder_token} getting married, photorealistic, high quality"
image = pipeline(prompt, num_inference_steps=50).images[0]

A1111 LoRA files

Automatic1111 (A1111) is a popular web UI for Stable Diffusion that supports model sharing platforms like Civitai. Models trained with the Low-Rank Adaptation (LoRA) technique are especially popular because they’re fast to train and have a much smaller file size than a fully finetuned model. 🤗 Diffusers supports loading A1111 LoRA checkpoints with load_lora_weights():

from diffusers import DiffusionPipeline, UniPCMultistepScheduler
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "andite/anything-v4.0", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)

Download a LoRA checkpoint from Civitai; this example uses the Howls Moving Castle,Interior/Scenery LoRA (Ghibli Stlye) checkpoint, but feel free to try out any LoRA checkpoint!

# uncomment to download the safetensor weights
#!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors

Load the LoRA checkpoint into the pipeline with the load_lora_weights() method:

pipeline.load_lora_weights(".", weight_name="howls_moving_castle.safetensors")

Now you can use the pipeline to generate images:

prompt = "masterpiece, illustration, ultra-detailed, cityscape, san francisco, golden gate bridge, california, bay area, in the snow, beautiful detailed starry sky"
negative_prompt = "lowres, cropped, worst quality, low quality, normal quality, artifacts, signature, watermark, username, blurry, more than one bridge, bad architecture"

images = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=512,
    height=512,
    num_inference_steps=25,
    num_images_per_prompt=4,
    generator=torch.manual_seed(0),
).images

Finally, create a helper function to display the images:

from PIL import Image


def image_grid(imgs, rows=2, cols=2):
    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid


image_grid(images)