Custom Layers and UtilitiesΒΆ
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
Pytorch custom modulesΒΆ
-
class
transformers.modeling_utils.Conv1D(nf, nx)[source]ΒΆ 1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).
Basically works like a linear layer but the weights are transposed.
- Parameters
nf (
int) β The number of output features.nx (
int) β The number of input features.
-
class
transformers.modeling_utils.PoolerStartLogits(config: transformers.configuration_utils.PretrainedConfig)[source]ΒΆ Compute SQuAD start logits from sequence hidden states.
- Parameters
config (
PretrainedConfig) β The config used by the model, will be used to grab thehidden_sizeof the model.
-
forward(hidden_states: torch.FloatTensor, p_mask: Optional[torch.FloatTensor] = None) → torch.FloatTensor[source]ΒΆ - Parameters
hidden_states (
torch.FloatTensorof shape(batch_size, seq_len, hidden_size)) β The final hidden states of the model.p_mask (
torch.FloatTensorof shape(batch_size, seq_len), optional) β Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.
- Returns
The start logits for SQuAD.
- Return type
torch.FloatTensor
-
class
transformers.modeling_utils.PoolerEndLogits(config: transformers.configuration_utils.PretrainedConfig)[source]ΒΆ Compute SQuAD end logits from sequence hidden states.
- Parameters
config (
PretrainedConfig) β The config used by the model, will be used to grab thehidden_sizeof the model and thelayer_norm_epsto use.
-
forward(hidden_states: torch.FloatTensor, start_states: Optional[torch.FloatTensor] = None, start_positions: Optional[torch.LongTensor] = None, p_mask: Optional[torch.FloatTensor] = None) → torch.FloatTensor[source]ΒΆ - Parameters
hidden_states (
torch.FloatTensorof shape(batch_size, seq_len, hidden_size)) β The final hidden states of the model.start_states (
torch.FloatTensorof shape(batch_size, seq_len, hidden_size), optional) β The hidden states of the first tokens for the labeled span.start_positions (
torch.LongTensorof shape(batch_size,), optional) β The position of the first token for the labeled span.p_mask (
torch.FloatTensorof shape(batch_size, seq_len), optional) β Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.
Note
One of
start_statesorstart_positionsshould be not obj:None. If both are set,start_positionsoverridesstart_states.- Returns
The end logits for SQuAD.
- Return type
torch.FloatTensor
-
class
transformers.modeling_utils.PoolerAnswerClass(config)[source]ΒΆ Compute SQuAD 2.0 answer class from classification and start tokens hidden states.
- Parameters
config (
PretrainedConfig) β The config used by the model, will be used to grab thehidden_sizeof the model.
-
forward(hidden_states: torch.FloatTensor, start_states: Optional[torch.FloatTensor] = None, start_positions: Optional[torch.LongTensor] = None, cls_index: Optional[torch.LongTensor] = None) → torch.FloatTensor[source]ΒΆ - Parameters
hidden_states (
torch.FloatTensorof shape(batch_size, seq_len, hidden_size)) β The final hidden states of the model.start_states (
torch.FloatTensorof shape(batch_size, seq_len, hidden_size), optional) β The hidden states of the first tokens for the labeled span.start_positions (
torch.LongTensorof shape(batch_size,), optional) β The position of the first token for the labeled span.cls_index (
torch.LongTensorof shape(batch_size,), optional) β Position of the CLS token for each sentence in the batch. IfNone, takes the last token.
Note
One of
start_statesorstart_positionsshould be not obj:None. If both are set,start_positionsoverridesstart_states.- Returns
The SQuAD 2.0 answer class.
- Return type
torch.FloatTensor
-
class
transformers.modeling_utils.SquadHeadOutput(loss: Optional[torch.FloatTensor] = None, start_top_log_probs: Optional[torch.FloatTensor] = None, start_top_index: Optional[torch.LongTensor] = None, end_top_log_probs: Optional[torch.FloatTensor] = None, end_top_index: Optional[torch.LongTensor] = None, cls_logits: Optional[torch.FloatTensor] = None)[source]ΒΆ Base class for outputs of question answering models using a
SQuADHead.- Parameters
loss (
torch.FloatTensorof shape(1,), optional, returned if bothstart_positionsandend_positionsare provided) β Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.start_top_log_probs (
torch.FloatTensorof shape(batch_size, config.start_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Log probabilities for the top config.start_n_top start token possibilities (beam-search).start_top_index (
torch.LongTensorof shape(batch_size, config.start_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Indices for the top config.start_n_top start token possibilities (beam-search).end_top_log_probs (
torch.FloatTensorof shape(batch_size, config.start_n_top * config.end_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Log probabilities for the topconfig.start_n_top * config.end_n_topend token possibilities (beam-search).end_top_index (
torch.LongTensorof shape(batch_size, config.start_n_top * config.end_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Indices for the topconfig.start_n_top * config.end_n_topend token possibilities (beam-search).cls_logits (
torch.FloatTensorof shape(batch_size,), optional, returned ifstart_positionsorend_positionsis not provided) β Log probabilities for theis_impossiblelabel of the answers.
-
class
transformers.modeling_utils.SQuADHead(config)[source]ΒΆ A SQuAD head inspired by XLNet.
- Parameters
config (
PretrainedConfig) β The config used by the model, will be used to grab thehidden_sizeof the model and thelayer_norm_epsto use.
-
forward(hidden_states: torch.FloatTensor, start_positions: Optional[torch.LongTensor] = None, end_positions: Optional[torch.LongTensor] = None, cls_index: Optional[torch.LongTensor] = None, is_impossible: Optional[torch.LongTensor] = None, p_mask: Optional[torch.FloatTensor] = None, return_dict: bool = False) → Union[transformers.modeling_utils.SquadHeadOutput, Tuple[torch.FloatTensor]][source]ΒΆ - Args:
- hidden_states (
torch.FloatTensorof shape(batch_size, seq_len, hidden_size)): Final hidden states of the model on the sequence tokens.
- start_positions (
torch.LongTensorof shape(batch_size,), optional): Positions of the first token for the labeled span.
- end_positions (
torch.LongTensorof shape(batch_size,), optional): Positions of the last token for the labeled span.
- cls_index (
torch.LongTensorof shape(batch_size,), optional): Position of the CLS token for each sentence in the batch. If
None, takes the last token.- is_impossible (
torch.LongTensorof shape(batch_size,), optional): Whether the question has a possible answer in the paragraph or not.
- p_mask (
torch.FloatTensorof shape(batch_size, seq_len), optional): Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.
- return_dict (
bool, optional, defaults toFalse): Whether or not to return a
ModelOutputinstead of a plain tuple.
- hidden_states (
- Returns
A
SquadHeadOutput(ifreturn_dict=Trueis passed or whenconfig.return_dict=True) or a tuple oftorch.FloatTensorcomprising various elements depending on the configuration (~transformers.) and inputs.loss (
torch.FloatTensorof shape(1,), optional, returned if bothstart_positionsandend_positionsare provided) β Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.start_top_log_probs (
torch.FloatTensorof shape(batch_size, config.start_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Log probabilities for the top config.start_n_top start token possibilities (beam-search).start_top_index (
torch.LongTensorof shape(batch_size, config.start_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Indices for the top config.start_n_top start token possibilities (beam-search).end_top_log_probs (
torch.FloatTensorof shape(batch_size, config.start_n_top * config.end_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Log probabilities for the topconfig.start_n_top * config.end_n_topend token possibilities (beam-search).end_top_index (
torch.LongTensorof shape(batch_size, config.start_n_top * config.end_n_top), optional, returned ifstart_positionsorend_positionsis not provided) β Indices for the topconfig.start_n_top * config.end_n_topend token possibilities (beam-search).cls_logits (
torch.FloatTensorof shape(batch_size,), optional, returned ifstart_positionsorend_positionsis not provided) β Log probabilities for theis_impossiblelabel of the answers.
- Return type
SquadHeadOutputortuple(torch.FloatTensor)
-
class
transformers.modeling_utils.SequenceSummary(config: transformers.configuration_utils.PretrainedConfig)[source]ΒΆ Compute a single vector summary of a sequence hidden states.
- Parameters
config (
PretrainedConfig) βThe config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):
summary_type (
str) β The method to use to make this summary. Accepted values are:"last"β Take the last token hidden state (like XLNet)"first"β Take the first token hidden state (like Bert)"mean"β Take the mean of all tokens hidden states"cls_index"β Supply a Tensor of classification token position (GPT/GPT-2)"attn"β Not implemented now, use multi-head attention
summary_use_proj (
bool) β Add a projection after the vector extraction.summary_proj_to_labels (
bool) β IfTrue, the projection outputs toconfig.num_labelsclasses (otherwise toconfig.hidden_size).summary_activation (
Optional[str]) β Set to"tanh"to add a tanh activation to the output, another string orNonewill add no activation.summary_first_dropout (
float) β Optional dropout probability before the projection and activation.summary_last_dropout (
float)β Optional dropout probability after the projection and activation.
-
forward(hidden_states: torch.FloatTensor, cls_index: Optional[torch.LongTensor] = None) → torch.FloatTensor[source]ΒΆ Compute a single vector summary of a sequence hidden states.
- Parameters
hidden_states (
torch.FloatTensorof shape[batch_size, seq_len, hidden_size]) β The hidden states of the last layer.cls_index (
torch.LongTensorof shape[batch_size]or[batch_size, ...]where β¦ are optional leading dimensions ofhidden_states, optional) β Used ifsummary_type == "cls_index"and takes the last token of the sequence as classification token.
- Returns
The summary of the sequence hidden states.
- Return type
torch.FloatTensor
PyTorch Helper FunctionsΒΆ
-
transformers.apply_chunking_to_forward(forward_fn: Callable[β¦, torch.Tensor], chunk_size: int, chunk_dim: int, *input_tensors) → torch.Tensor[source]ΒΆ This function chunks the
input_tensorsinto smaller input tensor parts of sizechunk_sizeover the dimensionchunk_dim. It then applies a layerforward_fnto each chunk independently to save memory.If the
forward_fnis independent across thechunk_dimthis function will yield the same result as directly applyingforward_fntoinput_tensors.- Parameters
forward_fn (
Callable[..., torch.Tensor]) β The forward function of the model.chunk_size (
int) β The chunk size of a chunked tensor:num_chunks = len(input_tensors[0]) / chunk_size.chunk_dim (
int) β The dimension over which theinput_tensorsshould be chunked.input_tensors (
Tuple[torch.Tensor]) β The input tensors offorward_fnwhich will be chunked
- Returns
A tensor with the same shape as the
forward_fnwould have given if applied`.- Return type
torch.Tensor
Examples:
# rename the usual forward() fn to forward_chunk() def forward_chunk(self, hidden_states): hidden_states = self.decoder(hidden_states) return hidden_states # implement a chunked forward function def forward(self, hidden_states): return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states)
-
transformers.modeling_utils.find_pruneable_heads_and_indices(heads: List[int], n_heads: int, head_size: int, already_pruned_heads: Set[int]) → Tuple[Set[int], torch.LongTensor][source]ΒΆ Finds the heads and their indices taking
already_pruned_headsinto account.- Parameters
heads (
List[int]) β List of the indices of heads to prune.n_heads (
int) β The number of heads in the model.head_size (
int) β The size of each head.already_pruned_heads (
Set[int]) β A set of already pruned heads.
- Returns
A tuple with the remaining heads and their corresponding indices.
- Return type
Tuple[Set[int], torch.LongTensor]
-
transformers.modeling_utils.prune_layer(layer: Union[torch.nn.modules.linear.Linear, transformers.modeling_utils.Conv1D], index: torch.LongTensor, dim: Optional[int] = None) → Union[torch.nn.modules.linear.Linear, transformers.modeling_utils.Conv1D][source]ΒΆ Prune a Conv1D or linear layer to keep only entries in index.
Used to remove heads.
- Parameters
layer (
Union[torch.nn.Linear, Conv1D]) β The layer to prune.index (
torch.LongTensor) β The indices to keep in the layer.dim (
int, optional) β The dimension on which to keep the indices.
- Returns
The pruned layer as a new layer with
requires_grad=True.- Return type
torch.nn.LinearorConv1D
-
transformers.modeling_utils.prune_conv1d_layer(layer: transformers.modeling_utils.Conv1D, index: torch.LongTensor, dim: int = 1) → transformers.modeling_utils.Conv1D[source]ΒΆ Prune a Conv1D layer to keep only entries in index. A Conv1D work as a Linear layer (see e.g. BERT) but the weights are transposed.
Used to remove heads.
-
transformers.modeling_utils.prune_linear_layer(layer: torch.nn.modules.linear.Linear, index: torch.LongTensor, dim: int = 0) → torch.nn.modules.linear.Linear[source]ΒΆ Prune a linear layer to keep only entries in index.
Used to remove heads.
- Parameters
layer (
torch.nn.Linear) β The layer to prune.index (
torch.LongTensor) β The indices to keep in the layer.dim (
int, optional, defaults to 0) β The dimension on which to keep the indices.
- Returns
The pruned layer as a new layer with
requires_grad=True.- Return type
torch.nn.Linear
TensorFlow custom layersΒΆ
-
class
transformers.modeling_tf_utils.TFConv1D(*args, **kwargs)[source]ΒΆ 1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).
Basically works like a linear layer but the weights are transposed.
- Parameters
nf (
int) β The number of output features.nx (
int) β The number of input features.initializer_range (
float, optional, defaults to 0.02) β The standard deviation to use to initialize the weights.kwargs β Additional keyword arguments passed along to the
__init__oftf.keras.layers.Layer.
Construct shared token embeddings.
The weights of the embedding layer is usually shared with the weights of the linear decoder when doing language modeling.
- Parameters
vocab_size (
int) β The size of the vocabulary, e.g., the number of unique tokens.hidden_size (
int) β The size of the embedding vectors.initializer_range (
float, optional) β The standard deviation to use when initializing the weights. If no value is provided, it will default to \(1/\sqrt{hidden\_size}\).kwargs β Additional keyword arguments passed along to the
__init__oftf.keras.layers.Layer.
Get token embeddings of inputs or decode final hidden state.
- Parameters
inputs (
tf.Tensor) βIn embedding mode, should be an int64 tensor with shape
[batch_size, length].In linear mode, should be a float tensor with shape
[batch_size, length, hidden_size].mode (
str, defaults to"embedding") β A valid value is either"embedding"or"linear", the first one indicates that the layer should be used as an embedding layer, the second one that the layer should be used as a linear decoder.
- Returns
In embedding mode, the output is a float32 embedding tensor, with shape
[batch_size, length, embedding_size].In linear mode, the output is a float32 with shape
[batch_size, length, vocab_size].- Return type
tf.Tensor- Raises
ValueError β if
modeis not valid.
Shared weights logic is adapted from here.
-
class
transformers.modeling_tf_utils.TFSequenceSummary(*args, **kwargs)[source]ΒΆ Compute a single vector summary of a sequence hidden states.
- Parameters
config (
PretrainedConfig) βThe config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):
summary_type (
str) β The method to use to make this summary. Accepted values are:"last"β Take the last token hidden state (like XLNet)"first"β Take the first token hidden state (like Bert)"mean"β Take the mean of all tokens hidden states"cls_index"β Supply a Tensor of classification token position (GPT/GPT-2)"attn"β Not implemented now, use multi-head attention
summary_use_proj (
bool) β Add a projection after the vector extraction.summary_proj_to_labels (
bool) β IfTrue, the projection outputs toconfig.num_labelsclasses (otherwise toconfig.hidden_size).summary_activation (
Optional[str]) β Set to"tanh"to add a tanh activation to the output, another string orNonewill add no activation.summary_first_dropout (
float) β Optional dropout probability before the projection and activation.summary_last_dropout (
float)β Optional dropout probability after the projection and activation.
initializer_range (
float, defaults to 0.02) β The standard deviation to use to initialize the weights.kwargs β Additional keyword arguments passed along to the
__init__oftf.keras.layers.Layer.
-
call(inputs, cls_index=None, training=False)[source]ΒΆ This is where the layerβs logic lives.
Note here that call() method in tf.keras is little bit different from keras API. In keras API, you can pass support masking for layers as additional arguments. Whereas tf.keras has compute_mask() method to support masking.
- Parameters
inputs β Input tensor, or list/tuple of input tensors.
**kwargs β Additional keyword arguments. Currently unused.
- Returns
A tensor or list/tuple of tensors.
TensorFlow loss functionsΒΆ
-
class
transformers.modeling_tf_utils.TFCausalLanguageModelingLoss[source]ΒΆ Loss function suitable for causal language modeling (CLM), that is, the task of guessing the next token.
Note
Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
-
class
transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss[source]ΒΆ Loss function suitable for masked language modeling (MLM), that is, the task of guessing the masked tokens.
Note
Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
-
class
transformers.modeling_tf_utils.TFMultipleChoiceLoss[source]ΒΆ Loss function suitable for multiple choice tasks.
-
class
transformers.modeling_tf_utils.TFQuestionAnsweringLoss[source]ΒΆ Loss function suitable for question answering.
TensorFlow Helper FunctionsΒΆ
-
transformers.modeling_tf_utils.get_initializer(initializer_range: float = 0.02) → tensorflow.python.keras.initializers.initializers_v2.TruncatedNormal[source]ΒΆ Creates a
tf.initializers.TruncatedNormalwith the given range.- Parameters
initializer_range (float, defaults to 0.02) β Standard deviation of the initializer range.
- Returns
The truncated normal initializer.
- Return type
tf.initializers.TruncatedNormal
-
transformers.modeling_tf_utils.keras_serializable(cls)[source]ΒΆ Decorate a Keras Layer class to support Keras serialization.
This is done by:
Adding a
transformers_configdict to the Keras config dictionary inget_config(called by Keras at serialization time.Wrapping
__init__to accept thattransformers_configdict (passed by Keras at deserialization time) and convert it to a config object for the actual layer initializer.Registering the class as a custom object in Keras (if the Tensorflow version supports this), so that it does not need to be supplied in
custom_objectsin the call totf.keras.models.load_model.
- Parameters
cls (a
tf.keras.layers.Layers subclass) β Typically aTF.MainLayerclass in this project, in general must accept aconfigargument to its initializer.- Returns
The same class object, with modifications for Keras deserialization.