Diffusers documentation

Attention Processor

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Attention Processor

An attention processor is a class for applying different types of attention mechanisms.

AttnProcessor

class diffusers.models.attention_processor.AttnProcessor

< >

( )

Default processor for performing attention-related computations.

AttnProcessor2_0

class diffusers.models.attention_processor.AttnProcessor2_0

< >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).

AttnAddedKVProcessor

class diffusers.models.attention_processor.AttnAddedKVProcessor

< >

( )

Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.

AttnAddedKVProcessor2_0

class diffusers.models.attention_processor.AttnAddedKVProcessor2_0

< >

( )

Processor for performing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.

CrossFrameAttnProcessor

class diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor

< >

( batch_size = 2 )

Cross frame attention processor. Each frame attends the first frame.

CustomDiffusionAttnProcessor

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor

< >

( train_kv: bool = True train_q_out: bool = True hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 )

Parameters

  • train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
  • train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
  • hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
  • cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
  • out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
  • dropout (float, optional, defaults to 0.0) — The dropout probability to use.

Processor for implementing attention for the Custom Diffusion method.

CustomDiffusionAttnProcessor2_0

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0

< >

( train_kv: bool = True train_q_out: bool = True hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 )

Parameters

  • train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
  • train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
  • hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
  • cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
  • out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
  • dropout (float, optional, defaults to 0.0) — The dropout probability to use.

Processor for implementing attention for the Custom Diffusion method using PyTorch 2.0’s memory-efficient scaled dot-product attention.

CustomDiffusionXFormersAttnProcessor

class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor

< >

( train_kv: bool = True train_q_out: bool = False hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 attention_op: Optional = None )

Parameters

  • train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
  • train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
  • hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
  • cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
  • out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
  • dropout (float, optional, defaults to 0.0) — The dropout probability to use.
  • attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.

FusedAttnProcessor2_0

class diffusers.models.attention_processor.FusedAttnProcessor2_0

< >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). It uses fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.

This API is currently 🧪 experimental in nature and can change in future.

SlicedAttnProcessor

class diffusers.models.attention_processor.SlicedAttnProcessor

< >

( slice_size: int )

Parameters

  • slice_size (int, optional) — The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size, and attention_head_dim must be a multiple of the slice_size.

Processor for implementing sliced attention.

SlicedAttnAddedKVProcessor

class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor

< >

( slice_size )

Parameters

  • slice_size (int, optional) — The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size, and attention_head_dim must be a multiple of the slice_size.

Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.

XFormersAttnProcessor

class diffusers.models.attention_processor.XFormersAttnProcessor

< >

( attention_op: Optional = None )

Parameters

  • attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers.

AttnProcessorNPU

class diffusers.models.attention_processor.AttnProcessorNPU

< >

( )

Processor for implementing flash attention using torch_npu. Torch_npu supports only fp16 and bf16 data types. If fp32 is used, F.scaled_dot_product_attention will be used for computation, but the acceleration effect on NPU is not significant.

< > Update on GitHub