Attention Processor
An attention processor is a class for applying different types of attention mechanisms.
AttnProcessor
Default processor for performing attention-related computations.
AttnProcessor2_0
Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).
FusedAttnProcessor2_0
Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). It uses fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.
This API is currently 🧪 experimental in nature and can change in future.
LoRAAttnProcessor
class diffusers.models.attention_processor.LoRAAttnProcessor
< source >( hidden_size: int cross_attention_dim: Optional = None rank: int = 4 network_alpha: Optional = None **kwargs )
Parameters
- hidden_size (
int
, optional) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional) — The number of channels in theencoder_hidden_states
. - rank (
int
, defaults to 4) — The dimension of the LoRA update matrices. - network_alpha (
int
, optional) — Equivalent toalpha
but it’s usage is specific to Kohya (A1111) style LoRAs. - kwargs (
dict
) — Additional keyword arguments to pass to theLoRALinearLayer
layers.
Processor for implementing the LoRA attention mechanism.
LoRAAttnProcessor2_0
class diffusers.models.attention_processor.LoRAAttnProcessor2_0
< source >( hidden_size: int cross_attention_dim: Optional = None rank: int = 4 network_alpha: Optional = None **kwargs )
Parameters
- hidden_size (
int
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional) — The number of channels in theencoder_hidden_states
. - rank (
int
, defaults to 4) — The dimension of the LoRA update matrices. - network_alpha (
int
, optional) — Equivalent toalpha
but it’s usage is specific to Kohya (A1111) style LoRAs. - kwargs (
dict
) — Additional keyword arguments to pass to theLoRALinearLayer
layers.
Processor for implementing the LoRA attention mechanism using PyTorch 2.0’s memory-efficient scaled dot-product attention.
CustomDiffusionAttnProcessor
class diffusers.models.attention_processor.CustomDiffusionAttnProcessor
< source >( train_kv: bool = True train_q_out: bool = True hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 )
Parameters
- train_kv (
bool
, defaults toTrue
) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool
, defaults toTrue
) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int
, optional, defaults toNone
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - out_bias (
bool
, defaults toTrue
) — Whether to include the bias parameter intrain_q_out
. - dropout (
float
, optional, defaults to 0.0) — The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method.
CustomDiffusionAttnProcessor2_0
class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0
< source >( train_kv: bool = True train_q_out: bool = True hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 )
Parameters
- train_kv (
bool
, defaults toTrue
) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool
, defaults toTrue
) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int
, optional, defaults toNone
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - out_bias (
bool
, defaults toTrue
) — Whether to include the bias parameter intrain_q_out
. - dropout (
float
, optional, defaults to 0.0) — The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method using PyTorch 2.0’s memory-efficient scaled dot-product attention.
AttnAddedKVProcessor
Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.
AttnAddedKVProcessor2_0
Processor for performing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.
LoRAAttnAddedKVProcessor
class diffusers.models.attention_processor.LoRAAttnAddedKVProcessor
< source >( hidden_size: int cross_attention_dim: Optional = None rank: int = 4 network_alpha: Optional = None )
Parameters
- hidden_size (
int
, optional) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - rank (
int
, defaults to 4) — The dimension of the LoRA update matrices. - network_alpha (
int
, optional) — Equivalent toalpha
but it’s usage is specific to Kohya (A1111) style LoRAs. - kwargs (
dict
) — Additional keyword arguments to pass to theLoRALinearLayer
layers.
Processor for implementing the LoRA attention mechanism with extra learnable key and value matrices for the text encoder.
XFormersAttnProcessor
class diffusers.models.attention_processor.XFormersAttnProcessor
< source >( attention_op: Optional = None )
Parameters
- attention_op (
Callable
, optional, defaults toNone
) — The base operator to use as the attention operator. It is recommended to set toNone
, and allow xFormers to choose the best operator.
Processor for implementing memory efficient attention using xFormers.
LoRAXFormersAttnProcessor
class diffusers.models.attention_processor.LoRAXFormersAttnProcessor
< source >( hidden_size: int cross_attention_dim: int rank: int = 4 attention_op: Optional = None network_alpha: Optional = None **kwargs )
Parameters
- hidden_size (
int
, optional) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional) — The number of channels in theencoder_hidden_states
. - rank (
int
, defaults to 4) — The dimension of the LoRA update matrices. - attention_op (
Callable
, optional, defaults toNone
) — The base operator to use as the attention operator. It is recommended to set toNone
, and allow xFormers to choose the best operator. - network_alpha (
int
, optional) — Equivalent toalpha
but it’s usage is specific to Kohya (A1111) style LoRAs. - kwargs (
dict
) — Additional keyword arguments to pass to theLoRALinearLayer
layers.
Processor for implementing the LoRA attention mechanism with memory efficient attention using xFormers.
CustomDiffusionXFormersAttnProcessor
class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor
< source >( train_kv: bool = True train_q_out: bool = False hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 attention_op: Optional = None )
Parameters
- train_kv (
bool
, defaults toTrue
) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool
, defaults toTrue
) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int
, optional, defaults toNone
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - out_bias (
bool
, defaults toTrue
) — Whether to include the bias parameter intrain_q_out
. - dropout (
float
, optional, defaults to 0.0) — The dropout probability to use. - attention_op (
Callable
, optional, defaults toNone
) — The base operator to use as the attention operator. It is recommended to set toNone
, and allow xFormers to choose the best operator.
Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.
SlicedAttnProcessor
class diffusers.models.attention_processor.SlicedAttnProcessor
< source >( slice_size: int )
Processor for implementing sliced attention.
SlicedAttnAddedKVProcessor
class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor
< source >( slice_size )
Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.