transformers documentation

BERT

BERT

Overview

The BERT model was proposed in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It’s a bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.

The abstract from the paper is the following:

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

Tips:

  • BERT is a model with absolute position embeddings so it’s usually advised to pad the inputs on the right rather than the left.
  • BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.

This model was contributed by thomwolf. The original code can be found here.

BertConfig

class transformers.BertConfig < > expand 

( vocab_size = 30522 hidden_size = 768 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 pad_token_id = 0 position_embedding_type = 'absolute' use_cache = True classifier_dropout = None **kwargs )

This is the configuration class to store the configuration of a BertModel or a TFBertModel. It is used to instantiate a BERT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BERT bert-base-uncased architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

>>> from transformers import BertModel, BertConfig

>>> # Initializing a BERT bert-base-uncased style configuration
>>> configuration = BertConfig()

>>> # Initializing a model from the bert-base-uncased style configuration
>>> model = BertModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

BertTokenizer

class transformers.BertTokenizer < > expand 

( vocab_file do_lower_case = True do_basic_tokenize = True never_split = None unk_token = '[UNK]' sep_token = '[SEP]' pad_token = '[PAD]' cls_token = '[CLS]' mask_token = '[MASK]' tokenize_chinese_chars = True strip_accents = None **kwargs )

Construct a BERT tokenizer. Based on WordPiece.

This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

build_inputs_with_special_tokens < > expand 

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) ⟶ List[int]

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

  • single sequence: [CLS] X [SEP]
  • pair of sequences: [CLS] A [SEP] B [SEP]
get_special_tokens_mask < > expand 

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None already_has_special_tokens: bool = False ) ⟶ List[int]

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

create_token_type_ids_from_sequences < > expand 

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) ⟶ List[int]

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence pair mask has the following format:

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

BertTokenizerFast

class transformers.BertTokenizerFast < > expand 

( vocab_file = None tokenizer_file = None do_lower_case = True unk_token = '[UNK]' sep_token = '[SEP]' pad_token = '[PAD]' cls_token = '[CLS]' mask_token = '[MASK]' tokenize_chinese_chars = True strip_accents = None **kwargs )

Construct a “fast” BERT tokenizer (backed by HuggingFace’s tokenizers library). Based on WordPiece.

This tokenizer inherits from PreTrainedTokenizerFast which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

build_inputs_with_special_tokens < > expand 

( token_ids_0 token_ids_1 = None ) ⟶ List[int]

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

  • single sequence: [CLS] X [SEP]
  • pair of sequences: [CLS] A [SEP] B [SEP]
create_token_type_ids_from_sequences < > expand 

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) ⟶ List[int]

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence pair mask has the following format:

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

Bert specific outputs

class transformers.models.bert.modeling_bert.BertForPreTrainingOutput < > expand 

( loss: typing.Optional[torch.FloatTensor] = None prediction_logits: FloatTensor = None seq_relationship_logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor]] = None )

Output type of BertForPreTraining.

class transformers.models.bert.modeling_tf_bert.TFBertForPreTrainingOutput < > expand 

( loss: typing.Optional[tensorflow.python.framework.ops.Tensor] = None prediction_logits: Tensor = None seq_relationship_logits: Tensor = None hidden_states: typing.Union[typing.Tuple[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor, NoneType] = None attentions: typing.Union[typing.Tuple[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor, NoneType] = None )

Output type of TFBertForPreTraining.

class transformers.models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput < > expand 

( prediction_logits: ndarray = None seq_relationship_logits: ndarray = None hidden_states: typing.Optional[typing.Tuple[jax._src.numpy.lax_numpy.ndarray]] = None attentions: typing.Optional[typing.Tuple[jax._src.numpy.lax_numpy.ndarray]] = None )

Output type of BertForPreTraining.

replace < > expand 

( **updates )

“Returns a new object replacing the specified fields with new values.

BertModel

class transformers.BertModel < > expand 

( config add_pooling_layer = True )

The bare Bert Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument and add_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None encoder_hidden_states = None encoder_attention_mask = None past_key_values = None use_cache = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ BaseModelOutputWithPoolingAndCrossAttentions or tuple(torch.FloatTensor)

The BertModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertModel
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertModel.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

BertForPreTraining

class transformers.BertForPreTraining < > expand 

( config )

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None labels = None next_sentence_label = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ BertForPreTrainingOutput or tuple(torch.FloatTensor)

The BertForPreTraining forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertForPreTraining
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForPreTraining.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> prediction_logits = outputs.prediction_logits
>>> seq_relationship_logits = outputs.seq_relationship_logits

BertLMHeadModel

class transformers.BertLMHeadModel < > expand 

( config )

Bert Model with a language modeling head on top for CLM fine-tuning.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None encoder_hidden_states = None encoder_attention_mask = None labels = None past_key_values = None use_cache = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ CausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)

The BertLMHeadModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertLMHeadModel, BertConfig
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> config = BertConfig.from_pretrained("bert-base-cased")
>>> config.is_decoder = True
>>> model = BertLMHeadModel.from_pretrained('bert-base-cased', config=config)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> prediction_logits = outputs.logits

BertForMaskedLM

class transformers.BertForMaskedLM < > expand 

( config )

Bert Model with a language modeling head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None encoder_hidden_states = None encoder_attention_mask = None labels = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ MaskedLMOutput or tuple(torch.FloatTensor)

The BertForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertForMaskedLM
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForMaskedLM.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]

>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits

BertForNextSentencePrediction

class transformers.BertForNextSentencePrediction < > expand 

( config )

Bert Model with a next sentence prediction (classification) head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None labels = None output_attentions = None output_hidden_states = None return_dict = None **kwargs ) ⟶ NextSentencePredictorOutput or tuple(torch.FloatTensor)

The BertForNextSentencePrediction forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertForNextSentencePrediction
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
>>> encoding = tokenizer(prompt, next_sentence, return_tensors='pt')

>>> outputs = model(**encoding, labels=torch.LongTensor([1]))
>>> logits = outputs.logits
>>> assert logits[0, 0] < logits[0, 1] # next sentence was random

BertForSequenceClassification

class transformers.BertForSequenceClassification < > expand 

( config )

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None labels = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ SequenceClassifierOutput or tuple(torch.FloatTensor)

The BertForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification:

>>> from transformers import BertTokenizer, BertForSequenceClassification
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits

Example of multi-label classification:

>>> from transformers import BertTokenizer, BertForSequenceClassification
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForSequenceClassification.from_pretrained('bert-base-uncased', problem_type="multi_label_classification")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([[1, 1]], dtype=torch.float) # need dtype=float for BCEWithLogitsLoss
>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits

BertForMultipleChoice

class transformers.BertForMultipleChoice < > expand 

( config )

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None labels = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ MultipleChoiceModelOutput or tuple(torch.FloatTensor)

The BertForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertForMultipleChoice
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForMultipleChoice.from_pretrained('bert-base-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."
>>> labels = torch.tensor(0).unsqueeze(0)  # choice0 is correct (according to Wikipedia ;)), batch size 1

>>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors='pt', padding=True)
>>> outputs = model(**{k: v.unsqueeze(0) for k,v in encoding.items()}, labels=labels)  # batch size is 1

>>> # the linear classifier still needs to be trained
>>> loss = outputs.loss
>>> logits = outputs.logits

BertForTokenClassification

class transformers.BertForTokenClassification < > expand 

( config )

Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None labels = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ TokenClassifierOutput or tuple(torch.FloatTensor)

The BertForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertForTokenClassification
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForTokenClassification.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0)  # Batch size 1

>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits

BertForQuestionAnswering

class transformers.BertForQuestionAnswering < > expand 

( config )

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < > expand 

( input_ids = None attention_mask = None token_type_ids = None position_ids = None head_mask = None inputs_embeds = None start_positions = None end_positions = None output_attentions = None output_hidden_states = None return_dict = None ) ⟶ QuestionAnsweringModelOutput or tuple(torch.FloatTensor)

The BertForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, BertForQuestionAnswering
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors='pt')
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([3])

>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> loss = outputs.loss
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits

TFBertModel

class transformers.TFBertModel < > expand 

( *args **kwargs )

The bare Bert Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None encoder_hidden_states: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None encoder_attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None past_key_values: typing.Union[typing.Tuple[typing.Tuple[typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor]]], NoneType] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFBaseModelOutputWithPoolingAndCrossAttentions or `tuple(tf.Tensor)“

The TFBertModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, TFBertModel
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertModel.from_pretrained('bert-base-cased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)

>>> last_hidden_states = outputs.last_hidden_state

TFBertForPreTraining

class transformers.TFBertForPreTraining < > expand 

( *args **kwargs )

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None labels: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None next_sentence_label: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFBertForPreTrainingOutput or tuple(tf.Tensor)

The TFBertForPreTraining forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

>>> import tensorflow as tf
>>> from transformers import BertTokenizer, TFBertForPreTraining

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = TFBertForPreTraining.from_pretrained('bert-base-uncased')
>>> input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True))[None, :]  # Batch size 1
>>> outputs = model(input_ids)
>>> prediction_scores, seq_relationship_scores = outputs[:2]

TFBertModelLMHeadModel

call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None encoder_hidden_states: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None encoder_attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None past_key_values: typing.Union[typing.Tuple[typing.Tuple[typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor]]], NoneType] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None labels: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFCausalLMOutputWithCrossAttentions or tuple(tf.Tensor)

encoderhidden_states (tf.Tensor of shape (batch_size, sequence_length, hidden_size), _optional): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. encoderattention_mask (tf.Tensor of shape (batch_size, sequence_length), _optional): Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

pastkey_values (Tuple[Tuple[tf.Tensor]] of length config.n_layers) contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length). use_cache (bool, _optional, defaults to True): If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values). Set to False during training, True during generation labels (tf.Tensor or np.ndarray of shape (batch_size, sequence_length), optional): Labels for computing the cross entropy classification loss. Indices should be in [0, ..., config.vocab_size - 1].

Example:

>>> from transformers import BertTokenizer, TFBertLMHeadModel
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertLMHeadModel.from_pretrained('bert-base-cased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)
>>> logits = outputs.logits

TFBertForMaskedLM

class transformers.TFBertForMaskedLM < > expand 

( *args **kwargs )

Bert Model with a language modeling head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None labels: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFMaskedLMOutput or `tuple(tf.Tensor)“

The TFBertForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, TFBertForMaskedLM
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertForMaskedLM.from_pretrained('bert-base-cased')

>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="tf")
>>> inputs["labels"] = tokenizer("The capital of France is Paris.", return_tensors="tf")["input_ids"]

>>> outputs = model(inputs)
>>> loss = outputs.loss
>>> logits = outputs.logits

TFBertForNextSentencePrediction

class transformers.TFBertForNextSentencePrediction < > expand 

( *args **kwargs )

Bert Model with a next sentence prediction (classification) head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None next_sentence_label: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFNextSentencePredictorOutput or tuple(tf.Tensor)

The TFBertForNextSentencePrediction forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

>>> import tensorflow as tf
>>> from transformers import BertTokenizer, TFBertForNextSentencePrediction

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = TFBertForNextSentencePrediction.from_pretrained('bert-base-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
>>> encoding = tokenizer(prompt, next_sentence, return_tensors='tf')

>>> logits = model(encoding['input_ids'], token_type_ids=encoding['token_type_ids'])[0]
>>> assert logits[0][0] < logits[0][1] # the next sentence was random

TFBertForSequenceClassification

class transformers.TFBertForSequenceClassification < > expand 

( *args **kwargs )

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None labels: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFSequenceClassifierOutput or `tuple(tf.Tensor)“

The TFBertForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, TFBertForSequenceClassification
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> inputs["labels"] = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1

>>> outputs = model(inputs)
>>> loss = outputs.loss
>>> logits = outputs.logits

TFBertForMultipleChoice

class transformers.TFBertForMultipleChoice < > expand 

( *args **kwargs )

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None labels: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFMultipleChoiceModelOutput or `tuple(tf.Tensor)“

The TFBertForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, TFBertForMultipleChoice
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertForMultipleChoice.from_pretrained('bert-base-cased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."

>>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors='tf', padding=True)
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in encoding.items()}
>>> outputs = model(inputs)  # batch size is 1

>>> # the linear classifier still needs to be trained
>>> logits = outputs.logits

TFBertForTokenClassification

class transformers.TFBertForTokenClassification < > expand 

( *args **kwargs )

Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None labels: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFTokenClassifierOutput or `tuple(tf.Tensor)“

The TFBertForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, TFBertForTokenClassification
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertForTokenClassification.from_pretrained('bert-base-cased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> input_ids = inputs["input_ids"]
>>> inputs["labels"] = tf.reshape(tf.constant([1] * tf.size(input_ids).numpy()), (-1, tf.size(input_ids))) # Batch size 1

>>> outputs = model(inputs)
>>> loss = outputs.loss
>>> logits = outputs.logits

TFBertForQuestionAnswering

class transformers.TFBertForQuestionAnswering < > expand 

( *args **kwargs )

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TF 2.0 models accepts two formats as inputs:

  • having all inputs as keyword arguments (like PyTorch models), or
  • having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

  • a single Tensor with input_ids only and nothing else: model(inputs_ids)
  • a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])
  • a dictionary with one or several input Tensors associated to the input names given in the docstring: model({"input_ids": input_ids, "token_type_ids": token_type_ids})
call < > expand 

( input_ids: typing.Union[typing.List[tensorflow.python.framework.ops.Tensor], typing.List[numpy.ndarray], typing.List[tensorflow.python.keras.engine.keras_tensor.KerasTensor], typing.Dict[str, tensorflow.python.framework.ops.Tensor], typing.Dict[str, numpy.ndarray], typing.Dict[str, tensorflow.python.keras.engine.keras_tensor.KerasTensor], tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.keras.engine.keras_tensor.KerasTensor, NoneType] = None attention_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None token_type_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None position_ids: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None head_mask: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None inputs_embeds: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None start_positions: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None end_positions: typing.Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, NoneType] = None training: typing.Optional[bool] = False **kwargs ) ⟶ TFQuestionAnsweringModelOutput or `tuple(tf.Tensor)“

The TFBertForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, TFBertForQuestionAnswering
>>> import tensorflow as tf

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> model = TFBertForQuestionAnswering.from_pretrained('bert-base-cased')

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> input_dict = tokenizer(question, text, return_tensors='tf')
>>> outputs = model(input_dict)
>>> start_logits = outputs.start_logits
>>> end_logits = outputs.end_logits

>>> all_tokens = tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[0])
>>> answer = ' '.join(all_tokens[tf.math.argmax(start_logits, 1)[0] : tf.math.argmax(end_logits, 1)[0]+1])

FlaxBertModel

class transformers.FlaxBertModel < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

The bare Bert Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxBaseModelOutputWithPooling or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertModel

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertModel.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors='jax')
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

FlaxBertForPreTraining

class transformers.FlaxBertForPreTraining < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxBertForPreTrainingOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForPreTraining

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForPreTraining.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np")
>>> outputs = model(**inputs)

>>> prediction_logits = outputs.prediction_logits
>>> seq_relationship_logits = outputs.seq_relationship_logits

FlaxBertForMaskedLM

class transformers.FlaxBertForMaskedLM < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model with a language modeling head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxMaskedLMOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForMaskedLM

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForMaskedLM.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors='jax')

>>> outputs = model(**inputs)
>>> logits = outputs.logits

FlaxBertForNextSentencePrediction

class transformers.FlaxBertForNextSentencePrediction < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model with a next sentence prediction (classification) head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxNextSentencePredictorOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForNextSentencePrediction

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForNextSentencePrediction.from_pretrained('bert-base-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
>>> encoding = tokenizer(prompt, next_sentence, return_tensors='jax')

>>> outputs = model(**encoding)
>>> logits = outputs.logits
>>> assert logits[0, 0] < logits[0, 1] # next sentence was random

FlaxBertForSequenceClassification

class transformers.FlaxBertForSequenceClassification < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxSequenceClassifierOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForSequenceClassification

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForSequenceClassification.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors='jax')

>>> outputs = model(**inputs)
>>> logits = outputs.logits

FlaxBertForMultipleChoice

class transformers.FlaxBertForMultipleChoice < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxMultipleChoiceModelOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForMultipleChoice

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForMultipleChoice.from_pretrained('bert-base-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."

>>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors='jax', padding=True)
>>> outputs = model(**{k: v[None, :] for k,v in encoding.items()})

>>> logits = outputs.logits

FlaxBertForTokenClassification

class transformers.FlaxBertForTokenClassification < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxTokenClassifierOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForTokenClassification

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForTokenClassification.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors='jax')

>>> outputs = model(**inputs)
>>> logits = outputs.logits

FlaxBertForQuestionAnswering

class transformers.FlaxBertForQuestionAnswering < > expand 

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax._src.numpy.lax_numpy.float32'> **kwargs )

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also a Flax Linen flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__ < > expand 

( input_ids attention_mask = None token_type_ids = None position_ids = None params: dict = None dropout_rng: PRNGKey = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) ⟶ FlaxQuestionAnsweringModelOutput or tuple(torch.FloatTensor)

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import BertTokenizer, FlaxBertForQuestionAnswering

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = FlaxBertForQuestionAnswering.from_pretrained('bert-base-uncased')

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors='jax')

>>> outputs = model(**inputs)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits