Diffusers

You are viewing v0.14.0 version. A newer version v0.35.1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Using KerasCV Stable Diffusion Checkpoints in Diffusers

This is an experimental feature.

KerasCV provides APIs for implementing various computer vision workflows. It also provides the Stable Diffusion v1 and v2 models. Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. However, as of this writing, KerasCV offers limited support to experiment with Stable Diffusion models for inference and deployment. On the other hand, Diffusers provides tooling dedicated to this purpose (and more), such as different noise schedulers, flash attention, and other optimization techniques.

How about fine-tuning Stable Diffusion models in KerasCV and exporting them such that they become compatible with Diffusers to combine the best of both worlds? We have created a tool that lets you do just that! It takes KerasCV Stable Diffusion checkpoints and exports them to Diffusers-compatible checkpoints. More specifically, it first converts the checkpoints to PyTorch and then wraps them into a StableDiffusionPipeline which is ready for inference. Finally, it pushes the converted checkpoints to a repository on the Hugging Face Hub.

We welcome you to try out the tool here and share feedback via discussions.

Getting Started

First, you need to obtain the fine-tuned KerasCV Stable Diffusion checkpoints. We provide an overview of the different ways Stable Diffusion models can be fine-tuned using diffusers. For the Keras implementation of some of these methods, you can check out these resources:

Stable Diffusion is comprised of the following models:

Text encoder
UNet
VAE

Depending on the fine-tuning task, we may fine-tune one or more of these components (the VAE is almost always left untouched). Here are some common combinations:

DreamBooth: UNet and text encoder
Classical text to image fine-tuning: UNet
Textual Inversion: Just the newly initialized embeddings in the text encoder

Performing the Conversion

Let’s use this checkpoint which was generated by conducting Textual Inversion with the following “placeholder token”: <my-funny-cat-token>.

On the tool, we supply the following things:

Path(s) to download the fine-tuned checkpoint(s) (KerasCV)
An HF token
Placeholder token (only applicable for Textual Inversion)

As soon as you hit “Submit”, the conversion process will begin. Once it’s complete, you should see the following:

If you click the link, you should see something like so:

If you head over to the model card of the repository, the following should appear:

Note that we’re not specifying the UNet weights here since the UNet is not fine-tuned during Textual Inversion.

And that’s it! You now have your fine-tuned KerasCV Stable Diffusion model in Diffusers 🧨

Using the Converted Model in Diffusers

Just beside the model card of the repository, you’d notice an inference widget to try out the model directly from the UI 🤗

On the top right hand side, we provide a “Use in Diffusers” button. If you click the button, you should see the following code-snippet:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")

The model is in standard diffusers format. Let’s perform inference!

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
pipeline.to("cuda")

placeholder_token = "<my-funny-cat-token>"
prompt = f"two {placeholder_token} getting married, photorealistic, high quality"
image = pipeline(prompt, num_inference_steps=50).images[0]

And we get:

Note that if you specified a placeholder_token while performing the conversion, the tool will log it accordingly. Refer to the model card of this repository as an example.

We welcome you to use the tool for various Stable Diffusion fine-tuning scenarios and let us know your feedback! Here are some examples of Diffusers checkpoints that were obtained using the tool:

sayakpaul/text-unet-dogs-kerascv_sd_diffusers_pipeline (DreamBooth with both the text encoder and UNet fine-tuned)
sayakpaul/unet-dogs-kerascv_sd_diffusers_pipeline (DreamBooth with only the UNet fine-tuned)

Incorporating Diffusers Goodies 🎁

Diffusers provides various options that one can leverage to experiment with different inference setups. One particularly useful option is the use of a different noise scheduler during inference other than what was used during fine-tuning. Let’s try out the DPMSolverMultistepScheduler which is different from the one (DDPMScheduler) used during fine-tuning.

You can read more details about this process in this section.

from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
pipeline.to("cuda")

placeholder_token = "<my-funny-cat-token>"
prompt = f"two {placeholder_token} getting married, photorealistic, high quality"
image = pipeline(prompt, num_inference_steps=50).images[0]

One can also continue fine-tuning from these Diffusers checkpoints by leveraging some relevant tools from Diffusers. Refer here for more details. For inference-specific optimizations, refer here.

Known Limitations

Only Stable Diffusion v1 checkpoints are supported for conversion in this tool.

←Loading and Adding Custom Pipelines Unconditional Image Generation→