Attention Processor

An attention processor is a class for applying different types of attention mechanisms.

AttnProcessor

class diffusers.models.attention_processor.AttnProcessor

< source >

( )

Default processor for performing attention-related computations.

class diffusers.models.attention_processor.AttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).

class diffusers.models.attention_processor.AttnAddedKVProcessor

< source >

( )

Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.

class diffusers.models.attention_processor.AttnAddedKVProcessor2_0

< source >

( )

Processor for performing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.

class diffusers.models.attention_processor.AttnProcessorNPU

< source >

( )

Processor for implementing flash attention using torch_npu. Torch_npu supports only fp16 and bf16 data types. If fp32 is used, F.scaled_dot_product_attention will be used for computation, but the acceleration effect on NPU is not significant.

class diffusers.models.attention_processor.FusedAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). It uses fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.

This API is currently 🧪 experimental in nature and can change in future.

Allegro

class diffusers.models.attention_processor.AllegroAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). This is used in the Allegro model. It applies a normalization layer and rotary embedding on the query and key vector.

AuraFlow

class diffusers.models.attention_processor.AuraFlowAttnProcessor2_0

< source >

( )

Attention processor used typically in processing Aura Flow.

class diffusers.models.attention_processor.FusedAuraFlowAttnProcessor2_0

< source >

( )

Attention processor used typically in processing Aura Flow with fused projections.

CogVideoX

class diffusers.models.attention_processor.CogVideoXAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention for the CogVideoX model. It applies a rotary embedding on query and key vectors, but does not include spatial normalization.

class diffusers.models.attention_processor.FusedCogVideoXAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention for the CogVideoX model. It applies a rotary embedding on query and key vectors, but does not include spatial normalization.

CrossFrameAttnProcessor

class diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor

< source >

( batch_size = 2 )

Parameters

batch_size — The number that represents actual batch size, other than the frames. For example, calling unet with a single prompt and num_images_per_prompt=1, batch_size should be equal to 2, due to classifier-free guidance.

Cross frame attention processor. Each frame attends the first frame.

Custom Diffusion

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor

< source >

( train_kv: bool = True train_q_out: bool = True hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 )

Parameters

train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
dropout (float, optional, defaults to 0.0) — The dropout probability to use.

Processor for implementing attention for the Custom Diffusion method.

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0

< source >

( train_kv: bool = True train_q_out: bool = True hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 )

Parameters

train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
dropout (float, optional, defaults to 0.0) — The dropout probability to use.

Processor for implementing attention for the Custom Diffusion method using PyTorch 2.0’s memory-efficient scaled dot-product attention.

class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor

< source >

( train_kv: bool = True train_q_out: bool = False hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 attention_op: typing.Optional[typing.Callable] = None )

Parameters

train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
dropout (float, optional, defaults to 0.0) — The dropout probability to use.
attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.

Flux

class diffusers.models.attention_processor.FluxAttnProcessor2_0

< source >

( *args **kwargs )

class diffusers.models.attention_processor.FusedFluxAttnProcessor2_0

< source >

( *args **kwargs )

class diffusers.models.attention_processor.FluxSingleAttnProcessor2_0

< source >

( *args **kwargs )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).

Hunyuan

class diffusers.models.attention_processor.HunyuanAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). This is used in the HunyuanDiT model. It applies a s normalization layer and rotary embedding on query and key vector.

class diffusers.models.attention_processor.FusedHunyuanAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0) with fused projection layers. This is used in the HunyuanDiT model. It applies a s normalization layer and rotary embedding on query and key vector.

class diffusers.models.attention_processor.PAGHunyuanAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). This is used in the HunyuanDiT model. It applies a normalization layer and rotary embedding on query and key vector. This variant of the processor employs Pertubed Attention Guidance.

class diffusers.models.attention_processor.PAGCFGHunyuanAttnProcessor2_0

< source >

( )

IdentitySelfAttnProcessor2_0

class diffusers.models.attention_processor.PAGIdentitySelfAttnProcessor2_0

< source >

( )

Processor for implementing PAG using scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). PAG reference: https://huggingface.co/papers/2403.17377

class diffusers.models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0

< source >

( )

Processor for implementing PAG using scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). PAG reference: https://huggingface.co/papers/2403.17377

IP-Adapter

class diffusers.models.attention_processor.IPAdapterAttnProcessor

< source >

( hidden_size cross_attention_dim = None num_tokens = (4,) scale = 1.0 )

Parameters

hidden_size (int) — The hidden size of the attention layer.
cross_attention_dim (int) — The number of channels in the encoder_hidden_states.
num_tokens (int, Tuple[int] or List[int], defaults to (4,)) — The context length of the image features.
scale (float or Listfloat, defaults to 1.0) — the weight scale of image prompt.

Attention processor for Multiple IP-Adapters.

class diffusers.models.attention_processor.IPAdapterAttnProcessor2_0

< source >

( hidden_size cross_attention_dim = None num_tokens = (4,) scale = 1.0 )

Parameters

hidden_size (int) — The hidden size of the attention layer.
cross_attention_dim (int) — The number of channels in the encoder_hidden_states.
num_tokens (int, Tuple[int] or List[int], defaults to (4,)) — The context length of the image features.
scale (float or List[float], defaults to 1.0) — the weight scale of image prompt.

Attention processor for IP-Adapter for PyTorch 2.0.

class diffusers.models.attention_processor.SD3IPAdapterJointAttnProcessor2_0

< source >

( hidden_size: int ip_hidden_states_dim: int head_dim: int timesteps_emb_dim: int = 1280 scale: float = 0.5 )

Parameters

hidden_size (int) — The number of hidden channels.
ip_hidden_states_dim (int) — The image feature dimension.
head_dim (int) — The number of head channels.
timesteps_emb_dim (int, defaults to 1280) — The number of input channels for timestep embedding.
scale (float, defaults to 0.5) — IP-Adapter scale.

Attention processor for IP-Adapter used typically in processing the SD3-like self-attention projections, with additional image-based information and timestep embeddings.

JointAttnProcessor2_0

class diffusers.models.attention_processor.JointAttnProcessor2_0

< source >

( )

Attention processor used typically in processing the SD3-like self-attention projections.

class diffusers.models.attention_processor.PAGJointAttnProcessor2_0

< source >

( )

Attention processor used typically in processing the SD3-like self-attention projections.

class diffusers.models.attention_processor.PAGCFGJointAttnProcessor2_0

< source >

( )

Attention processor used typically in processing the SD3-like self-attention projections.

class diffusers.models.attention_processor.FusedJointAttnProcessor2_0

< source >

( )

Attention processor used typically in processing the SD3-like self-attention projections.

LoRA

class diffusers.models.attention_processor.LoRAAttnProcessor

< source >

( )

Processor for implementing attention with LoRA.

class diffusers.models.attention_processor.LoRAAttnProcessor2_0

< source >

( )

Processor for implementing attention with LoRA (enabled by default if you’re using PyTorch 2.0).

class diffusers.models.attention_processor.LoRAAttnAddedKVProcessor

< source >

( )

Processor for implementing attention with LoRA with extra learnable key and value matrices for the text encoder.

class diffusers.models.attention_processor.LoRAXFormersAttnProcessor

< source >

( )

Processor for implementing attention with LoRA using xFormers.

Lumina-T2X

class diffusers.models.attention_processor.LuminaAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). This is used in the LuminaNextDiT model. It applies a s normalization layer and rotary embedding on query and key vector.

Mochi

class diffusers.models.attention_processor.MochiAttnProcessor2_0

< source >

( )

Attention processor used in Mochi.

class diffusers.models.attention_processor.MochiVaeAttnProcessor2_0

< source >

( )

Attention processor used in Mochi VAE.

Sana

class diffusers.models.attention_processor.SanaLinearAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product linear attention.

class diffusers.models.attention_processor.SanaMultiscaleAttnProcessor2_0

< source >

( )

Processor for implementing multiscale quadratic attention.

class diffusers.models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product linear attention.

class diffusers.models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product linear attention.

Stable Audio

class diffusers.models.attention_processor.StableAudioAttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). This is used in the Stable Audio model. It applies rotary embedding on query and key vector, and allows MHA, GQA or MQA.

SlicedAttnProcessor

class diffusers.models.attention_processor.SlicedAttnProcessor

< source >

( slice_size: int )

Parameters

slice_size (int, optional) — The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size, and attention_head_dim must be a multiple of the slice_size.

Processor for implementing sliced attention.

class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor

< source >

( slice_size )

Parameters

slice_size (int, optional) — The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size, and attention_head_dim must be a multiple of the slice_size.

Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.

XFormersAttnProcessor

class diffusers.models.attention_processor.XFormersAttnProcessor

< source >

( attention_op: typing.Optional[typing.Callable] = None )

Parameters

attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers.

class diffusers.models.attention_processor.XFormersAttnAddedKVProcessor

< source >

( attention_op: typing.Optional[typing.Callable] = None )

Parameters

attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers.

XLAFlashAttnProcessor2_0

class diffusers.models.attention_processor.XLAFlashAttnProcessor2_0

< source >

( partition_spec: typing.Optional[typing.Tuple[typing.Optional[str], ...]] = None )

Processor for implementing scaled dot-product attention with pallas flash attention kernel if using torch_xla.

XFormersJointAttnProcessor

class diffusers.models.attention_processor.XFormersJointAttnProcessor

< source >

( attention_op: typing.Optional[typing.Callable] = None )

Parameters

attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers.

IPAdapterXFormersAttnProcessor

class diffusers.models.attention_processor.IPAdapterXFormersAttnProcessor

< source >

( hidden_size cross_attention_dim = None num_tokens = (4,) scale = 1.0 attention_op: typing.Optional[typing.Callable] = None )

Parameters

hidden_size (int) — The hidden size of the attention layer.
cross_attention_dim (int) — The number of channels in the encoder_hidden_states.
num_tokens (int, Tuple[int] or List[int], defaults to (4,)) — The context length of the image features.
scale (float or List[float], defaults to 1.0) — the weight scale of image prompt.
attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Attention processor for IP-Adapter using xFormers.

FluxIPAdapterJointAttnProcessor2_0

class diffusers.models.attention_processor.FluxIPAdapterJointAttnProcessor2_0

< source >

( *args **kwargs )

XLAFluxFlashAttnProcessor2_0

class diffusers.models.attention_processor.XLAFluxFlashAttnProcessor2_0

< source >

( *args **kwargs )

Processor for implementing scaled dot-product attention with pallas flash attention kernel if using torch_xla.

< > Update on GitHub

Diffusers

Attention Processor

AttnProcessor

class diffusers.models.attention_processor.AttnProcessor

class diffusers.models.attention_processor.AttnProcessor2_0

class diffusers.models.attention_processor.AttnAddedKVProcessor

class diffusers.models.attention_processor.AttnAddedKVProcessor2_0

class diffusers.models.attention_processor.AttnProcessorNPU

class diffusers.models.attention_processor.FusedAttnProcessor2_0

Allegro

class diffusers.models.attention_processor.AllegroAttnProcessor2_0

AuraFlow

class diffusers.models.attention_processor.AuraFlowAttnProcessor2_0

class diffusers.models.attention_processor.FusedAuraFlowAttnProcessor2_0

CogVideoX

class diffusers.models.attention_processor.CogVideoXAttnProcessor2_0

class diffusers.models.attention_processor.FusedCogVideoXAttnProcessor2_0

CrossFrameAttnProcessor

class diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor

Custom Diffusion

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0

class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor

Flux

class diffusers.models.attention_processor.FluxAttnProcessor2_0

class diffusers.models.attention_processor.FusedFluxAttnProcessor2_0

class diffusers.models.attention_processor.FluxSingleAttnProcessor2_0

Hunyuan

class diffusers.models.attention_processor.HunyuanAttnProcessor2_0

class diffusers.models.attention_processor.FusedHunyuanAttnProcessor2_0

class diffusers.models.attention_processor.PAGHunyuanAttnProcessor2_0

class diffusers.models.attention_processor.PAGCFGHunyuanAttnProcessor2_0

IdentitySelfAttnProcessor2_0

class diffusers.models.attention_processor.PAGIdentitySelfAttnProcessor2_0

class diffusers.models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0

IP-Adapter

class diffusers.models.attention_processor.IPAdapterAttnProcessor

class diffusers.models.attention_processor.IPAdapterAttnProcessor2_0

class diffusers.models.attention_processor.SD3IPAdapterJointAttnProcessor2_0

JointAttnProcessor2_0

class diffusers.models.attention_processor.JointAttnProcessor2_0

class diffusers.models.attention_processor.PAGJointAttnProcessor2_0

class diffusers.models.attention_processor.PAGCFGJointAttnProcessor2_0

class diffusers.models.attention_processor.FusedJointAttnProcessor2_0

LoRA

class diffusers.models.attention_processor.LoRAAttnProcessor

class diffusers.models.attention_processor.LoRAAttnProcessor2_0

class diffusers.models.attention_processor.LoRAAttnAddedKVProcessor

class diffusers.models.attention_processor.LoRAXFormersAttnProcessor

Lumina-T2X

class diffusers.models.attention_processor.LuminaAttnProcessor2_0

Mochi

class diffusers.models.attention_processor.MochiAttnProcessor2_0

class diffusers.models.attention_processor.MochiVaeAttnProcessor2_0

Sana

class diffusers.models.attention_processor.SanaLinearAttnProcessor2_0

class diffusers.models.attention_processor.SanaMultiscaleAttnProcessor2_0

class diffusers.models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0

class diffusers.models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0

Stable Audio

class diffusers.models.attention_processor.StableAudioAttnProcessor2_0

SlicedAttnProcessor

class diffusers.models.attention_processor.SlicedAttnProcessor

class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor

XFormersAttnProcessor

class diffusers.models.attention_processor.XFormersAttnProcessor

class diffusers.models.attention_processor.XFormersAttnAddedKVProcessor

XLAFlashAttnProcessor2_0

class diffusers.models.attention_processor.XLAFlashAttnProcessor2_0

XFormersJointAttnProcessor

class diffusers.models.attention_processor.XFormersJointAttnProcessor

IPAdapterXFormersAttnProcessor

class diffusers.models.attention_processor.IPAdapterXFormersAttnProcessor

FluxIPAdapterJointAttnProcessor2_0

class diffusers.models.attention_processor.FluxIPAdapterJointAttnProcessor2_0

XLAFluxFlashAttnProcessor2_0

class diffusers.models.attention_processor.XLAFluxFlashAttnProcessor2_0