Diffusers documentation

Attention Processor

Diffusers

You are viewing v0.23.1 version. A newer version v0.35.1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Attention Processor

An attention processor is a class for applying different types of attention mechanisms.

AttnProcessor

class diffusers.models.attention_processor.AttnProcessor

< source >

( )

Default processor for performing attention-related computations.

AttnProcessor2_0

class diffusers.models.attention_processor.AttnProcessor2_0

< source >

( )

Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).

LoRAAttnProcessor

class diffusers.models.attention_processor.LoRAAttnProcessor

< source >

( hidden_size: int cross_attention_dim: typing.Optional[int] = None rank: int = 4 network_alpha: typing.Optional[int] = None **kwargs )

Parameters

hidden_size (int, optional) — The hidden size of the attention layer.
cross_attention_dim (int, optional) — The number of channels in the encoder_hidden_states.
rank (int, defaults to 4) — The dimension of the LoRA update matrices.
network_alpha (int, optional) — Equivalent to alpha but it’s usage is specific to Kohya (A1111) style LoRAs.
kwargs (dict) — Additional keyword arguments to pass to the LoRALinearLayer layers.

Processor for implementing the LoRA attention mechanism.

LoRAAttnProcessor2_0

class diffusers.models.attention_processor.LoRAAttnProcessor2_0

< source >

( hidden_size: int cross_attention_dim: typing.Optional[int] = None rank: int = 4 network_alpha: typing.Optional[int] = None **kwargs )

Parameters

hidden_size (int) — The hidden size of the attention layer.
cross_attention_dim (int, optional) — The number of channels in the encoder_hidden_states.
rank (int, defaults to 4) — The dimension of the LoRA update matrices.
network_alpha (int, optional) — Equivalent to alpha but it’s usage is specific to Kohya (A1111) style LoRAs.
kwargs (dict) — Additional keyword arguments to pass to the LoRALinearLayer layers.

Processor for implementing the LoRA attention mechanism using PyTorch 2.0’s memory-efficient scaled dot-product attention.

CustomDiffusionAttnProcessor

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor

< source >

( train_kv: bool = True train_q_out: bool = True hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 )

Parameters

train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
dropout (float, optional, defaults to 0.0) — The dropout probability to use.

Processor for implementing attention for the Custom Diffusion method.

CustomDiffusionAttnProcessor2_0

class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0

< source >

( train_kv: bool = True train_q_out: bool = True hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 )

Parameters

train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
dropout (float, optional, defaults to 0.0) — The dropout probability to use.

Processor for implementing attention for the Custom Diffusion method using PyTorch 2.0’s memory-efficient scaled dot-product attention.

AttnAddedKVProcessor

class diffusers.models.attention_processor.AttnAddedKVProcessor

< source >

( )

Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.

AttnAddedKVProcessor2_0

class diffusers.models.attention_processor.AttnAddedKVProcessor2_0

< source >

( )

Processor for performing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.

LoRAAttnAddedKVProcessor

class diffusers.models.attention_processor.LoRAAttnAddedKVProcessor

< source >

( hidden_size: int cross_attention_dim: typing.Optional[int] = None rank: int = 4 network_alpha: typing.Optional[int] = None )

Parameters

hidden_size (int, optional) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
rank (int, defaults to 4) — The dimension of the LoRA update matrices.
network_alpha (int, optional) — Equivalent to alpha but it’s usage is specific to Kohya (A1111) style LoRAs.
kwargs (dict) — Additional keyword arguments to pass to the LoRALinearLayer layers.

Processor for implementing the LoRA attention mechanism with extra learnable key and value matrices for the text encoder.

XFormersAttnProcessor

class diffusers.models.attention_processor.XFormersAttnProcessor

< source >

( attention_op: typing.Optional[typing.Callable] = None )

Parameters

attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers.

LoRAXFormersAttnProcessor

class diffusers.models.attention_processor.LoRAXFormersAttnProcessor

< source >

( hidden_size: int cross_attention_dim: int rank: int = 4 attention_op: typing.Optional[typing.Callable] = None network_alpha: typing.Optional[int] = None **kwargs )

Parameters

hidden_size (int, optional) — The hidden size of the attention layer.
cross_attention_dim (int, optional) — The number of channels in the encoder_hidden_states.
rank (int, defaults to 4) — The dimension of the LoRA update matrices.
attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.
network_alpha (int, optional) — Equivalent to alpha but it’s usage is specific to Kohya (A1111) style LoRAs.
kwargs (dict) — Additional keyword arguments to pass to the LoRALinearLayer layers.

Processor for implementing the LoRA attention mechanism with memory efficient attention using xFormers.

CustomDiffusionXFormersAttnProcessor

class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor

< source >

( train_kv: bool = True train_q_out: bool = False hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 attention_op: typing.Optional[typing.Callable] = None )

Parameters

train_kv (bool, defaults to True) — Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool, defaults to True) — Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int, optional, defaults to None) — The hidden size of the attention layer.
cross_attention_dim (int, optional, defaults to None) — The number of channels in the encoder_hidden_states.
out_bias (bool, defaults to True) — Whether to include the bias parameter in train_q_out.
dropout (float, optional, defaults to 0.0) — The dropout probability to use.
attention_op (Callable, optional, defaults to None) — The base operator to use as the attention operator. It is recommended to set to None, and allow xFormers to choose the best operator.

Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.

SlicedAttnProcessor

class diffusers.models.attention_processor.SlicedAttnProcessor

< source >

( slice_size: int )

Parameters

slice_size (int, optional) — The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size, and attention_head_dim must be a multiple of the slice_size.

Processor for implementing sliced attention.

SlicedAttnAddedKVProcessor

class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor

< source >

( slice_size )

Parameters

slice_size (int, optional) — The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size, and attention_head_dim must be a multiple of the slice_size.

Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.

←Overview Custom activation functions→