Encoder Decoder Models¶
This class can wrap an encoder model, such as BertModel
and a decoder modeling with a language modeling head, such as BertForMaskedLM
into a encoder-decoder model.
The EncoderDecoderModel
class allows to instantiate a encoder decoder model using the from_encoder_decoder_pretrain
class method taking a pretrained encoder and pretrained decoder model as an input.
The EncoderDecoderModel
is saved using the standard save_pretrained()
method and can also again be loaded using the standard from_pretrained()
method.
An application of this architecture could be summarization using two pretrained Bert models as is shown in the paper: Text Summarization with Pretrained Encoders by Yang Liu and Mirella Lapata.
EncoderDecoderConfig
¶
-
class
transformers.
EncoderDecoderConfig
(**kwargs)[source]¶ EncoderDecoderConfig
is the configuration class to store the configuration of a EncoderDecoderModel.It is used to instantiate an Encoder Decoder model according to the specified arguments, defining the encoder and decoder configs. Configuration objects inherit from
PretrainedConfig
and can be used to control the model outputs. See the documentation forPretrainedConfig
for more information.- Parameters
kwargs (optional) –
- Remaining dictionary of keyword arguments. Notably:
- encoder (
PretrainedConfig
, optional, defaults to None): An instance of a configuration object that defines the encoder config.
- encoder (
PretrainedConfig
, optional, defaults to None): An instance of a configuration object that defines the decoder config.
- encoder (
Example:
from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel # Initializing a BERT bert-base-uncased style configuration config_encoder = BertConfig() config_decoder = BertConfig() config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) # Initializing a Bert2Bert model from the bert-base-uncased style configurations model = EncoderDecoderModel(config=config) # Accessing the model configuration config_encoder = model.config.encoder config_decoder = model.config.decoder
-
classmethod
from_encoder_decoder_configs
(encoder_config: transformers.configuration_utils.PretrainedConfig, decoder_config: transformers.configuration_utils.PretrainedConfig) → transformers.configuration_utils.PretrainedConfig[source]¶ Instantiate a
EncoderDecoderConfig
(or a derived class) from a pre-trained encoder model configuration and decoder model configuration.- Returns
An instance of a configuration object
- Return type
EncoderDecoderModel
¶
-
class
transformers.
EncoderDecoderModel
(config: Optional[transformers.configuration_utils.PretrainedConfig] = None, encoder: Optional[transformers.modeling_utils.PreTrainedModel] = None, decoder: Optional[transformers.modeling_utils.PreTrainedModel] = None)[source]¶ EncoderDecoder
is a generic model class that will be instantiated as a transformer architecture with one of the base model classes of the library as encoder and another one as decoder when created with the AutoModel.from_pretrained(pretrained_model_name_or_path) class method for the encoder and AutoModelWithLMHead.from_pretrained(pretrained_model_name_or_path) class method for the decoder.-
config_class
¶ alias of
transformers.configuration_encoder_decoder.EncoderDecoderConfig
-
forward
(input_ids=None, inputs_embeds=None, attention_mask=None, head_mask=None, encoder_outputs=None, decoder_input_ids=None, decoder_attention_mask=None, decoder_head_mask=None, decoder_inputs_embeds=None, masked_lm_labels=None, lm_labels=None, **kwargs)[source]¶ - Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) – Indices of input sequence tokens in the vocabulary for the encoder. Indices can be obtained usingtransformers.PretrainedTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.convert_tokens_to_ids()
for details.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Mask to avoid performing attention on padding token indices for the encoder. Mask values selected in[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.head_mask – (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
): Mask to nullify selected heads of the self-attention modules for the encoder. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.encoder_outputs (
tuple(tuple(torch.FloatTensor)
, optional, defaults toNone
) – Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.decoder_input_ids (
torch.LongTensor
of shape(batch_size, target_sequence_length)
, optional, defaults toNone
) – Provide for sequence to sequence training to the decoder. Indices can be obtained usingtransformers.PretrainedTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.convert_tokens_to_ids()
for details.decoder_attention_mask (
torch.BoolTensor
of shape(batch_size, tgt_seq_len)
, optional, defaults toNone
) – Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.decoder_head_mask – (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
): Mask to nullify selected heads of the self-attention modules for the decoder. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.decoder_inputs_embeds (
torch.FloatTensor
of shape(batch_size, target_sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passingdecoder_input_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.masked_lm_labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Labels for computing the masked language modeling loss for the decoder. Indices should be in[-100, 0, ..., config.vocab_size]
(seeinput_ids
docstring) Tokens with indices set to-100
are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size]
lm_labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Labels for computing the left-to-right language modeling loss (next word prediction) for the decoder. Indices should be in[-100, 0, ..., config.vocab_size]
(seeinput_ids
docstring) Tokens with indices set to-100
are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size]
kwargs – (optional) Remaining dictionary of keyword arguments. Keyword arguments come in two flavors: - Without a prefix which will be input as **encoder_kwargs for the encoder forward function. - With a decoder_ prefix which will be input as **decoder_kwargs for the decoder forward function.
Examples:
from transformers import EncoderDecoderModel, BertTokenizer import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert # forward input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 outputs = model(input_ids=input_ids, decoder_input_ids=input_ids) # training loss, outputs = model(input_ids=input_ids, decoder_input_ids=input_ids, lm_labels=input_ids)[:2] # generation generated = model.generate(input_ids, decoder_start_token_id=model.config.decoder.pad_token_id)
-
classmethod
from_encoder_decoder_pretrained
(encoder_pretrained_model_name_or_path: str = None, decoder_pretrained_model_name_or_path: str = None, *model_args, **kwargs) → transformers.modeling_utils.PreTrainedModel[source]¶ Instantiates an encoder and a decoder from one or two base classes of the library from pre-trained model checkpoints.
The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train the model, you need to first set it back in training mode with model.train().
- Params:
- encoder_pretrained_model_name_or_path (:obj: str, optional, defaults to None):
information necessary to initiate the encoder. Either:
a string with the shortcut name of a pre-trained model to load from cache or download, e.g.:
bert-base-uncased
.a string with the identifier name of a pre-trained model that was user-uploaded to our S3, e.g.:
dbmdz/bert-base-german-cased
.a path to a directory containing model weights saved using
save_pretrained()
, e.g.:./my_model_directory/encoder
.a path or url to a tensorflow index checkpoint file (e.g. ./tf_model/model.ckpt.index). In this case,
from_tf
should be set to True and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- decoder_pretrained_model_name_or_path (:obj: str, optional, defaults to None):
information necessary to initiate the decoder. Either:
a string with the shortcut name of a pre-trained model to load from cache or download, e.g.:
bert-base-uncased
.a string with the identifier name of a pre-trained model that was user-uploaded to our S3, e.g.:
dbmdz/bert-base-german-cased
.a path to a directory containing model weights saved using
save_pretrained()
, e.g.:./my_model_directory/decoder
.a path or url to a tensorflow index checkpoint file (e.g. ./tf_model/model.ckpt.index). In this case,
from_tf
should be set to True and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args: (optional) Sequence of positional arguments:
All remaning positional arguments will be passed to the underlying model’s
__init__
method- kwargs: (optional) Remaining dictionary of keyword arguments.
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g.
output_attention=True
). Behave differently depending on whether a config is provided or automatically loaded:
Examples:
from tranformers import EncoderDecoder model = EncoderDecoder.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert
-
get_input_embeddings
()[source]¶ Returns the model’s input embeddings.
- Returns
A torch module mapping vocabulary to hidden states.
- Return type
nn.Module
-