Diffusers documentation
Attention Processor
Attention Processor
An attention processor is a class for applying different types of attention mechanisms.
AttnProcessor
Default processor for performing attention-related computations.
AttnProcessor2_0
Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).
AttnAddedKVProcessor
Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.
AttnAddedKVProcessor2_0
Processor for performing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.
CrossFrameAttnProcessor
class diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
< source >( batch_size = 2 )
Cross frame attention processor. Each frame attends the first frame.
CustomDiffusionAttnProcessor
class diffusers.models.attention_processor.CustomDiffusionAttnProcessor
< source >( train_kv: bool = True train_q_out: bool = True hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 )
Parameters
- train_kv (
bool, defaults toTrue) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool, defaults toTrue) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int, optional, defaults toNone) — The hidden size of the attention layer. - cross_attention_dim (
int, optional, defaults toNone) — The number of channels in theencoder_hidden_states. - out_bias (
bool, defaults toTrue) — Whether to include the bias parameter intrain_q_out. - dropout (
float, optional, defaults to 0.0) — The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method.
CustomDiffusionAttnProcessor2_0
class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0
< source >( train_kv: bool = True train_q_out: bool = True hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 )
Parameters
- train_kv (
bool, defaults toTrue) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool, defaults toTrue) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int, optional, defaults toNone) — The hidden size of the attention layer. - cross_attention_dim (
int, optional, defaults toNone) — The number of channels in theencoder_hidden_states. - out_bias (
bool, defaults toTrue) — Whether to include the bias parameter intrain_q_out. - dropout (
float, optional, defaults to 0.0) — The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method using PyTorch 2.0’s memory-efficient scaled dot-product attention.
CustomDiffusionXFormersAttnProcessor
class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor
< source >( train_kv: bool = True train_q_out: bool = False hidden_size: Optional = None cross_attention_dim: Optional = None out_bias: bool = True dropout: float = 0.0 attention_op: Optional = None )
Parameters
- train_kv (
bool, defaults toTrue) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool, defaults toTrue) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int, optional, defaults toNone) — The hidden size of the attention layer. - cross_attention_dim (
int, optional, defaults toNone) — The number of channels in theencoder_hidden_states. - out_bias (
bool, defaults toTrue) — Whether to include the bias parameter intrain_q_out. - dropout (
float, optional, defaults to 0.0) — The dropout probability to use. - attention_op (
Callable, optional, defaults toNone) — The base operator to use as the attention operator. It is recommended to set toNone, and allow xFormers to choose the best operator.
Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.
FusedAttnProcessor2_0
Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). It uses fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.
This API is currently 🧪 experimental in nature and can change in future.
LoRAAttnAddedKVProcessor
class diffusers.models.attention_processor.LoRAAttnAddedKVProcessor
< source >( hidden_size: int cross_attention_dim: Optional = None rank: int = 4 network_alpha: Optional = None )
Parameters
- hidden_size (
int, optional) — The hidden size of the attention layer. - cross_attention_dim (
int, optional, defaults toNone) — The number of channels in theencoder_hidden_states. - rank (
int, defaults to 4) — The dimension of the LoRA update matrices. - network_alpha (
int, optional) — Equivalent toalphabut it’s usage is specific to Kohya (A1111) style LoRAs. - kwargs (
dict) — Additional keyword arguments to pass to theLoRALinearLayerlayers.
Processor for implementing the LoRA attention mechanism with extra learnable key and value matrices for the text encoder.
LoRAXFormersAttnProcessor
class diffusers.models.attention_processor.LoRAXFormersAttnProcessor
< source >( hidden_size: int cross_attention_dim: int rank: int = 4 attention_op: Optional = None network_alpha: Optional = None **kwargs )
Parameters
- hidden_size (
int, optional) — The hidden size of the attention layer. - cross_attention_dim (
int, optional) — The number of channels in theencoder_hidden_states. - rank (
int, defaults to 4) — The dimension of the LoRA update matrices. - attention_op (
Callable, optional, defaults toNone) — The base operator to use as the attention operator. It is recommended to set toNone, and allow xFormers to choose the best operator. - network_alpha (
int, optional) — Equivalent toalphabut it’s usage is specific to Kohya (A1111) style LoRAs. - kwargs (
dict) — Additional keyword arguments to pass to theLoRALinearLayerlayers.
Processor for implementing the LoRA attention mechanism with memory efficient attention using xFormers.
SlicedAttnProcessor
class diffusers.models.attention_processor.SlicedAttnProcessor
< source >( slice_size: int )
Processor for implementing sliced attention.
SlicedAttnAddedKVProcessor
class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor
< source >( slice_size )
Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.
XFormersAttnProcessor
class diffusers.models.attention_processor.XFormersAttnProcessor
< source >( attention_op: Optional = None )
Parameters
- attention_op (
Callable, optional, defaults toNone) — The base operator to use as the attention operator. It is recommended to set toNone, and allow xFormers to choose the best operator.
Processor for implementing memory efficient attention using xFormers.