Diffusers documentation

Pipeline callbacks

Diffusers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.34.0).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Pipeline callbacks

The denoising loop of a pipeline can be modified with custom defined functions using the callback_on_step_end parameter. The callback function is executed at the end of each step, and modifies the pipeline attributes and variables for the next step. This is really useful for dynamically adjusting certain pipeline attributes or modifying tensor variables. This versatility allows for interesting use cases such as changing the prompt embeddings at each timestep, assigning different weights to the prompt embeddings, and editing the guidance scale. With callbacks, you can implement new features without modifying the underlying code!

🤗 Diffusers currently only supports callback_on_step_end, but feel free to open a feature request if you have a cool use-case and require a callback function with a different execution point!

This guide will demonstrate how callbacks work by a few features you can implement with them.

Official callbacks

We provide a list of callbacks you can plug into an existing pipeline and modify the denoising loop. This is the current list of official callbacks:

SDCFGCutoffCallback: Disables the CFG after a certain number of steps for all SD 1.5 pipelines, including text-to-image, image-to-image, inpaint, and controlnet.
SDXLCFGCutoffCallback: Disables the CFG after a certain number of steps for all SDXL pipelines, including text-to-image, image-to-image, inpaint, and controlnet.
IPAdapterScaleCutoffCallback: Disables the IP Adapter after a certain number of steps for all pipelines supporting IP-Adapter.

If you want to add a new official callback, feel free to open a feature request or submit a PR.

To set up a callback, you need to specify the number of denoising steps after which the callback comes into effect. You can do so by using either one of these two arguments

cutoff_step_ratio: Float number with the ratio of the steps.
cutoff_step_index: Integer number with the exact number of the step.

import torch

from diffusers import DPMSolverMultistepScheduler, StableDiffusionXLPipeline
from diffusers.callbacks import SDXLCFGCutoffCallback


callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
# can also be used with cutoff_step_index
# callback = SDXLCFGCutoffCallback(cutoff_step_ratio=None, cutoff_step_index=10)

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)

prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution"

generator = torch.Generator(device="cpu").manual_seed(2628670641)

out = pipeline(
    prompt=prompt,
    negative_prompt="",
    guidance_scale=6.5,
    num_inference_steps=25,
    generator=generator,
    callback_on_step_end=callback,
)

out.images[0].save("official_callback.png")

generated image of a sports car at the road

without SDXLCFGCutoffCallback

generated image of a sports car at the road with cfg callback

with SDXLCFGCutoffCallback

Dynamic classifier-free guidance

Dynamic classifier-free guidance (CFG) is a feature that allows you to disable CFG after a certain number of inference steps which can help you save compute with minimal cost to performance. The callback function for this should have the following arguments:

pipeline (or the pipeline instance) provides access to important properties such as num_timesteps and guidance_scale. You can modify these properties by updating the underlying attributes. For this example, you’ll disable CFG by setting pipeline._guidance_scale=0.0.
step_index and timestep tell you where you are in the denoising loop. Use step_index to turn off CFG after reaching 40% of num_timesteps.
callback_kwargs is a dict that contains tensor variables you can modify during the denoising loop. It only includes variables specified in the callback_on_step_end_tensor_inputs argument, which is passed to the pipeline’s __call__ method. Different pipelines may use different sets of variables, so please check a pipeline’s _callback_tensor_inputs attribute for the list of variables you can modify. Some common variables include latents and prompt_embeds. For this function, change the batch size of prompt_embeds after setting guidance_scale=0.0 in order for it to work properly.

Your callback function should look something like this:

def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):
        # adjust the batch_size of prompt_embeds according to guidance_scale
        if step_index == int(pipeline.num_timesteps * 0.4):
                prompt_embeds = callback_kwargs["prompt_embeds"]
                prompt_embeds = prompt_embeds.chunk(2)[-1]

                # update guidance_scale and prompt_embeds
                pipeline._guidance_scale = 0.0
                callback_kwargs["prompt_embeds"] = prompt_embeds
        return callback_kwargs

Now, you can pass the callback function to the callback_on_step_end parameter and the prompt_embeds to callback_on_step_end_tensor_inputs.

import torch
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"

generator = torch.Generator(device="cuda").manual_seed(1)
out = pipeline(
    prompt,
    generator=generator,
    callback_on_step_end=callback_dynamic_cfg,
    callback_on_step_end_tensor_inputs=['prompt_embeds']
)

out.images[0].save("out_custom_cfg.png")

Interrupt the diffusion process

The interruption callback is supported for text-to-image, image-to-image, and inpainting for the StableDiffusionPipeline and StableDiffusionXLPipeline.

Stopping the diffusion process early is useful when building UIs that work with Diffusers because it allows users to stop the generation process if they’re unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.

This callback function should take the following arguments: pipeline, i, t, and callback_kwargs (this must be returned). Set the pipeline’s _interrupt attribute to True to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.

In this example, the diffusion process is stopped after 10 steps even though num_inference_steps is set to 50.

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
pipeline.enable_model_cpu_offload()
num_inference_steps = 50

def interrupt_callback(pipeline, i, t, callback_kwargs):
    stop_idx = 10
    if i == stop_idx:
        pipeline._interrupt = True

    return callback_kwargs

pipeline(
    "A photo of a cat",
    num_inference_steps=num_inference_steps,
    callback_on_step_end=interrupt_callback,
)

IP Adapter Cutoff

IP Adapter is an image prompt adapter that can be used for diffusion models without any changes to the underlying model. We can use the IP Adapter Cutoff Callback to disable the IP Adapter after a certain number of steps. To set up the callback, you need to specify the number of denoising steps after which the callback comes into effect. You can do so by using either one of these two arguments:

cutoff_step_ratio: Float number with the ratio of the steps.
cutoff_step_index: Integer number with the exact number of the step.

We need to download the diffusion model and load the ip_adapter for it as follows:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
pipeline.set_ip_adapter_scale(0.6)

The setup for the callback should look something like this:


from diffusers import AutoPipelineForText2Image
from diffusers.callbacks import IPAdapterScaleCutoffCallback
from diffusers.utils import load_image
import torch
 

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", 
    torch_dtype=torch.float16
).to("cuda")


pipeline.load_ip_adapter(
    "h94/IP-Adapter", 
    subfolder="sdxl_models", 
    weight_name="ip-adapter_sdxl.bin"
)

pipeline.set_ip_adapter_scale(0.6)


callback = IPAdapterScaleCutoffCallback(
    cutoff_step_ratio=None, 
    cutoff_step_index=5
)

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png"
)

generator = torch.Generator(device="cuda").manual_seed(2628670641)

images = pipeline(
    prompt="a tiger sitting in a chair drinking orange juice",
    ip_adapter_image=image,
    negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
    generator=generator,
    num_inference_steps=50,
    callback_on_step_end=callback,
).images

images[0].save("custom_callback_img.png")

generated image of a tiger sitting in a chair drinking orange juice

without IPAdapterScaleCutoffCallback

generated image of a tiger sitting in a chair drinking orange juice with ip adapter callback

with IPAdapterScaleCutoffCallback

Display image after each generation step

This tip was contributed by asomoza.

Display an image after each generation step by accessing and converting the latents after each step into an image. The latent space is compressed to 128x128, so the images are also 128x128 which is useful for a quick preview.

Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the Explaining the SDXL latent space blog post.

def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
        (60,  10, -5, -35),
    )

    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
    image_array = rgb_tensor.clamp(0, 255).byte().cpu().numpy().transpose(1, 2, 0)

    return Image.fromarray(image_array)

Create a function to decode and save the latents into an image.

def decode_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]

    image = latents_to_rgb(latents[0])
    image.save(f"{step}.png")

    return callback_kwargs

Pass the decode_tensors function to the callback_on_step_end parameter to decode the tensors after each step. You also need to specify what you want to modify in the callback_on_step_end_tensor_inputs parameter, which in this case are the latents.

from diffusers import AutoPipelineForText2Image
import torch
from PIL import Image

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
).to("cuda")

image = pipeline(
    prompt="A croissant shaped like a cute bear.",
    negative_prompt="Deformed, ugly, bad anatomy",
    callback_on_step_end=decode_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]

step 0

step 19

step 29

step 39

step 49

< > Update on GitHub

←Scheduler features Reproducible pipelines→