Diffusers

You are viewing v0.30.0 version. A newer version v0.35.1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

K-Diffusion

k-diffusion is a popular library created by Katherine Crowson. We provide StableDiffusionKDiffusionPipeline and StableDiffusionXLKDiffusionPipeline that allow you to run Stable DIffusion with samplers from k-diffusion.

Note that most the samplers from k-diffusion are implemented in Diffusers and we recommend using existing schedulers. You can find a mapping between k-diffusion samplers and schedulers in Diffusers here

StableDiffusionKDiffusionPipeline

class diffusers.StableDiffusionKDiffusionPipeline

< source >

( vae text_encoder tokenizer unet scheduler safety_checker feature_extractor requires_safety_checker: bool = True )

Parameters

vae (AutoencoderKL) — Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
text_encoder (CLIPTextModel) — Frozen text-encoder. Stable Diffusion uses the text portion of CLIP, specifically the clip-vit-large-patch14 variant.
tokenizer (CLIPTokenizer) — Tokenizer of class CLIPTokenizer.
unet (UNet2DConditionModel) — Conditional U-Net architecture to denoise the encoded image latents.
scheduler (SchedulerMixin) — A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of DDIMScheduler, LMSDiscreteScheduler, or PNDMScheduler.
safety_checker (StableDiffusionSafetyChecker) — Classification module that estimates whether generated images could be considered offensive or harmful. Please, refer to the model card for details.
feature_extractor (CLIPImageProcessor) — Model that extracts features from generated images to be used as inputs for the safety_checker.

Pipeline for text-to-image generation using Stable Diffusion.

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

The pipeline also inherits the following loading methods:

load_textual_inversion() for loading textual inversion embeddings
load_lora_weights() for loading LoRA weights
save_lora_weights() for saving LoRA weights

This is an experimental pipeline and is likely to change in the future.

encode_prompt

< source >

( prompt device num_images_per_prompt do_classifier_free_guidance negative_prompt = None prompt_embeds: Optional = None negative_prompt_embeds: Optional = None lora_scale: Optional = None clip_skip: Optional = None )

Parameters

prompt (str or List[str], optional) — prompt to be encoded device — (torch.device): torch device
num_images_per_prompt (int) — number of images that should be generated per prompt
do_classifier_free_guidance (bool) — whether to use classifier free guidance or not
negative_prompt (str or List[str], optional) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
prompt_embeds (torch.Tensor, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (torch.Tensor, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
lora_scale (float, optional) — A LoRA scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
clip_skip (int, optional) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.

Encodes the prompt into text encoder hidden states.

StableDiffusionXLKDiffusionPipeline

class diffusers.StableDiffusionXLKDiffusionPipeline

< source >

( vae: AutoencoderKL text_encoder: CLIPTextModel text_encoder_2: CLIPTextModelWithProjection tokenizer: CLIPTokenizer tokenizer_2: CLIPTokenizer unet: UNet2DConditionModel scheduler: KarrasDiffusionSchedulers force_zeros_for_empty_prompt: bool = True )

Parameters

vae (AutoencoderKL) — Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
text_encoder (CLIPTextModel) — Frozen text-encoder. Stable Diffusion XL uses the text portion of CLIP, specifically the clip-vit-large-patch14 variant.
text_encoder_2 ( CLIPTextModelWithProjection) — Second frozen text-encoder. Stable Diffusion XL uses the text and pool portion of CLIP, specifically the laion/CLIP-ViT-bigG-14-laion2B-39B-b160k variant.
tokenizer (CLIPTokenizer) — Tokenizer of class CLIPTokenizer.
tokenizer_2 (CLIPTokenizer) — Second Tokenizer of class CLIPTokenizer.
unet (UNet2DConditionModel) — Conditional U-Net architecture to denoise the encoded image latents.
scheduler (SchedulerMixin) — A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of DDIMScheduler, LMSDiscreteScheduler, or PNDMScheduler.
force_zeros_for_empty_prompt (bool, optional, defaults to "True") — Whether the negative prompt embeddings shall be forced to always be set to 0. Also see the config of stabilityai/stable-diffusion-xl-base-1-0.

Pipeline for text-to-image generation using Stable Diffusion XL and k-diffusion.

The pipeline also inherits the following loading methods:

load_textual_inversion() for loading textual inversion embeddings
from_single_file() for loading .ckpt files
load_lora_weights() for loading LoRA weights
save_lora_weights() for saving LoRA weights
load_ip_adapter() for loading IP Adapters

encode_prompt

< source >

( prompt: str prompt_2: Optional = None device: Optional = None num_images_per_prompt: int = 1 do_classifier_free_guidance: bool = True negative_prompt: Optional = None negative_prompt_2: Optional = None prompt_embeds: Optional = None negative_prompt_embeds: Optional = None pooled_prompt_embeds: Optional = None negative_pooled_prompt_embeds: Optional = None lora_scale: Optional = None clip_skip: Optional = None )

Parameters

prompt (str or List[str], optional) — prompt to be encoded
prompt_2 (str or List[str], optional) — The prompt or prompts to be sent to the tokenizer_2 and text_encoder_2. If not defined, prompt is used in both text-encoders device — (torch.device): torch device
num_images_per_prompt (int) — number of images that should be generated per prompt
do_classifier_free_guidance (bool) — whether to use classifier free guidance or not
negative_prompt (str or List[str], optional) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
negative_prompt_2 (str or List[str], optional) — The prompt or prompts not to guide the image generation to be sent to tokenizer_2 and text_encoder_2. If not defined, negative_prompt is used in both text-encoders
prompt_embeds (torch.Tensor, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (torch.Tensor, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
pooled_prompt_embeds (torch.Tensor, optional) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, pooled text embeddings will be generated from prompt input argument.
negative_pooled_prompt_embeds (torch.Tensor, optional) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, pooled negative_prompt_embeds will be generated from negative_prompt input argument.
lora_scale (float, optional) — A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
clip_skip (int, optional) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.

Encodes the prompt into text encoder hidden states.

< > Update on GitHub

←Super-resolution LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler→