# Funnel Transformer¶

## Overview¶

The Funnel Transformer model was proposed in the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks (CNN) in computer vision.

The abstract from the paper is the following:

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading comprehension.

Tips:

The original code can be found here.

## FunnelConfig¶

class transformers.FunnelConfig(vocab_size=30522, block_sizes=[4, 4, 4], block_repeats=None, num_decoder_layers=2, d_model=768, n_head=12, d_head=64, d_inner=3072, hidden_act='gelu_new', hidden_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, max_position_embeddings=512, type_vocab_size=3, initializer_range=0.1, initializer_std=None, layer_norm_eps=1e-09, pooling_type='mean', attention_type='relative_shift', separate_cls=True, truncate_seq=True, pool_q_only=True, **kwargs)[source]

This is the configuration class to store the configuration of a FunnelModel. It is used to instantiate an Funnel Transformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Funnel Transformer funnel-transformer/small architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Parameters
• vocab_size (int, optional, defaults to 30522) – Vocabulary size of the Funnel transformer. Defines the different tokens that can be represented by the inputs_ids passed to the forward method of FunnelModel.

• block_sizes (List[int], optional, defaults to [4, 4, 4]) – The sizes of the blocks used in the model.

• block_repeats (List[int], optional) – If passed along, each layer of each block is repeated the number of times indicated.

• num_decoder_layers (int, optional, defaults to 2) – The number of layers in the decoder (when not using the base model).

• d_model (int, optional, defaults to 768) – Dimensionality of the model’s hidden states.

• n_head (int, optional, defaults to 12) – Number of attention heads for each attention layer in the Transformer encoder.

• d_head (int, optional, defaults to 64) – Dimensionality of the model’s heads.

• d_inner (int, optional, defaults to 3072) – Inner dimension in the feed-forward blocks.

• hidden_act (str or callable, optional, defaults to "gelu_new") – The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "swish" and "gelu_new" are supported.

• hidden_dropout (float, optional, defaults to 0.1) – The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.

• attention_dropout (float, optional, defaults to 0.1) – The dropout probability for the attention probabilities.

• activation_dropout (float, optional, defaults to 0.0) – The dropout probability used between the two layers of the feed-forward blocks.

• max_position_embeddings (int, optional, defaults to 512) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

• type_vocab_size (int, optional, defaults to 3) – The vocabulary size of the token_type_ids passed into FunnelModel.

• initializer_range (float, optional, defaults to 0.1) – The standard deviation of the uniform initializer for initializing all weight matrices in attention layers.

• initializer_std (float, optional) – The standard deviation of the normal initializer for initializing the embedding matrix and the weight of linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for linear layers.

• layer_norm_eps (float, optional, defaults to 1e-9) – The epsilon used by the layer normalization layers.

• pooling_type (str, optional, defaults to "mean") – Possible values are "mean" or "max". The way pooling is performed at the beginning of each block.

• attention_type (str, optional, defaults to "relative_shift") – Possible values are "relative_shift" or "factorized". The former is faster on CPU/GPU while the latter is faster on TPU.

• separate_cls (bool, optional, defaults to True) – Whether or not to separate the cls token when applying pooling.

• truncate_seq (bool, optional, defaults to False) – When using separate_cls, whether or not to truncate the last token when pooling, to avoid getting a sequence length that is not a multiple of 2.

• pool_q_only (bool, optional, defaults to False) – Whether or not to apply the pooling only to the query or to query, key and values for the attention layers.

## FunnelTokenizer¶

class transformers.FunnelTokenizer(vocab_file, do_lower_case=True, do_basic_tokenize=True, never_split=None, unk_token='<unk>', sep_token='<sep>', pad_token='<pad>', cls_token='<cls>', mask_token='<mask>', bos_token='<s>', eos_token='</s>', tokenize_chinese_chars=True, strip_accents=None, **kwargs)[source]

Tokenizer for the Funnel Transformer models.

FunnelTokenizer is identical to BertTokenizer and runs end-to-end tokenization: punctuation splitting + wordpiece.

Refer to superclass BertTokenizer for usage examples and documentation concerning parameters.

build_inputs_with_special_tokens(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) → List[int]

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

• single sequence: [CLS] X [SEP]

• pair of sequences: [CLS] A [SEP] B [SEP]

Parameters
• token_ids_0 (List[int]) – List of IDs to which the special tokens will be added

• token_ids_1 (List[int], optional) – Optional second list of IDs for sequence pairs.

Returns

list of input IDs with the appropriate special tokens.

Return type

List[int]

create_token_type_ids_from_sequences(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) → List[int][source]

Creates a mask from the two sequences passed to be used in a sequence-pair classification task. Funnel Transformer expects a sequence pair mask that has the following format:

2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |


if token_ids_1 is None, only returns the first portion of the mask (0’s).

Parameters
• token_ids_0 (List[int]) – List of ids.

• token_ids_1 (List[int], optional) – Optional second list of IDs for sequence pairs.

Returns

List of token type IDs according to the given sequence(s).

Return type

List[int]

get_special_tokens_mask(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False) → List[int]

Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

Parameters
• token_ids_0 (List[int]) – List of ids.

• token_ids_1 (List[int], optional) – Optional second list of IDs for sequence pairs.

• already_has_special_tokens (bool, optional, defaults to False) – Set to True if the token list is already formatted with special tokens for the model

Returns

A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Return type

List[int]

save_vocabulary(vocab_path)

Save the sentencepiece vocabulary (copy original file) and special tokens file to a directory.

Parameters

vocab_path (str) – The directory in which to save the vocabulary.

Returns

Paths to the files saved.

Return type

Tuple(str)

## FunnelTokenizerFast¶

class transformers.FunnelTokenizerFast(vocab_file, do_lower_case=True, unk_token='<unk>', sep_token='<sep>', pad_token='<pad>', cls_token='<cls>', mask_token='<mask>', bos_token='<s>', eos_token='</s>', clean_text=True, tokenize_chinese_chars=True, strip_accents=None, wordpieces_prefix='##', **kwargs)[source]

“Fast” tokenizer for the Funnel Transformer models (backed by HuggingFace’s tokenizers library).

FunnelTokenizerFast is identical to BertTokenizerFast and runs end-to-end tokenization: punctuation splitting + wordpiece.

Refer to superclass BertTokenizerFast for usage examples and documentation concerning parameters.

create_token_type_ids_from_sequences(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) → List[int][source]

Creates a mask from the two sequences passed to be used in a sequence-pair classification task. Funnel Transformer expects a sequence pair mask that has the following format:

2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |


if token_ids_1 is None, only returns the first portion of the mask (0’s).

Parameters
• token_ids_0 (List[int]) – List of ids.

• token_ids_1 (List[int], optional) – Optional second list of IDs for sequence pairs.

Returns

List of token type IDs according to the given sequence(s).

Return type

List[int]

## Funnel specific outputs¶

class transformers.modeling_funnel.FunnelForPreTrainingOutput(loss: Optional[torch.FloatTensor] = None, logits: torch.FloatTensor = None, hidden_states: Optional[Tuple[torch.FloatTensor]] = None, attentions: Optional[Tuple[torch.FloatTensor]] = None)[source]

Output type of FunnelForPreTrainingModel.

Parameters
• loss (optional, returned when labels is provided, torch.FloatTensor of shape (1,)) – Total loss of the ELECTRA-style objective.

• logits (torch.FloatTensor of shape (batch_size, sequence_length)) – Prediction scores of the head (scores for each token before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class transformers.modeling_tf_funnel.TFFunnelForPreTrainingOutput(logits: tensorflow.python.framework.ops.Tensor = None, hidden_states: Optional[Tuple[tensorflow.python.framework.ops.Tensor]] = None, attentions: Optional[Tuple[tensorflow.python.framework.ops.Tensor]] = None)[source]

Output type of FunnelForPreTrainingModel.

Parameters
• logits (tf.Tensor of shape (batch_size, sequence_length)) – Prediction scores of the head (scores for each token before SoftMax).

• hidden_states (tuple(tf.ensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –

Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –

Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

## FunnelBaseModel¶

class transformers.FunnelBaseModel(config)[source]

The base Funnel Transformer Model transformer outputting raw hidden-states without upsampling head (also called decoder) or any task-specific head on top. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelBaseModel forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

Returns

A BaseModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

BaseModelOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelBaseModel
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base')
>>> model = FunnelBaseModel.from_pretrained('funnel-transformer/small-base', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

get_input_embeddings()[source]

Returns the model’s input embeddings.

Returns

A torch module mapping vocabulary to hidden states.

Return type

nn.Module

set_input_embeddings(new_embeddings)[source]

Set model’s input embeddings

Parameters

value (nn.Module) – A module mapping vocabulary to hidden states.

## FunnelModel¶

class transformers.FunnelModel(config)[source]

The bare Funnel Transformer Model transformer outputting raw hidden-states without any specific head on top. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelModel forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

Returns

A BaseModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

BaseModelOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelModel
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')
>>> model = FunnelModel.from_pretrained('funnel-transformer/small', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

get_input_embeddings()[source]

Returns the model’s input embeddings.

Returns

A torch module mapping vocabulary to hidden states.

Return type

nn.Module

set_input_embeddings(new_embeddings)[source]

Set model’s input embeddings

Parameters

value (nn.Module) – A module mapping vocabulary to hidden states.

## FunnelModelForPreTraining¶

class transformers.FunnelForPreTraining(config)[source]
forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelForPreTraining forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape {0}):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape {0}, optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape {0}, optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Labels for computing the ELECTRA-style loss. Input should be a sequence of tokens (see input_ids docstring) Indices should be in [0, 1]. 0 indicates the token is an original token, 1 indicates the token was replaced.

Returns

A FunnelForPreTrainingOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (optional, returned when labels is provided, torch.FloatTensor of shape (1,)) – Total loss of the ELECTRA-style objective.

• logits (torch.FloatTensor of shape (batch_size, sequence_length)) – Prediction scores of the head (scores for each token before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Examples:

>>> from transformers import FunnelTokenizer, FunnelForPreTraining
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')
>>> model = FunnelForPreTraining.from_pretrained('funnel-transformer/small', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors= "pt")
>>> logits = model(**inputs).logits


Return type

FunnelForPreTrainingOutput or tuple(torch.FloatTensor)

class transformers.FunnelForMaskedLM(config)[source]

Funnel Transformer Model with a language modeling head on top. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelForMaskedLM forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

Returns

A MaskedLMOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Masked languaged modeling (MLM) loss.

• logits (torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

MaskedLMOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelForMaskedLM
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')

>>> input_ids = tokenizer("Hello, my dog is cute", return_tensors="pt")["input_ids"]

>>> outputs = model(input_ids, labels=input_ids)
>>> loss = outputs.loss
>>> prediction_logits = outputs.logits

get_output_embeddings()[source]

Returns the model’s output embeddings.

Returns

A torch module mapping hidden states to vocabulary.

Return type

nn.Module

## FunnelForSequenceClassification¶

class transformers.FunnelForSequenceClassification(config)[source]

Funnel Transfprmer Model with a sequence classification/regression head on top (two linear layer on top of the first timestep of the last hidden state) e.g. for GLUE tasks. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelForSequenceClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

labels (torch.LongTensor of shape (batch_size,), optional, defaults to None):

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Returns

A SequenceClassifierOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.

• logits (torch.FloatTensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

SequenceClassifierOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelForSequenceClassification
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base')
>>> model = FunnelForSequenceClassification.from_pretrained('funnel-transformer/small-base', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits


## FunnelForMultipleChoice¶

class transformers.FunnelForMultipleChoice(config)[source]

Funnel Transformer Model with a multiple choice classification head on top (two linear layer on top of the first timestep of the last hidden state, and a softmax) e.g. for RocStories/SWAG tasks. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelForMultipleChoice forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, num_choices, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, num_choices, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, num_choices, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

labels (torch.LongTensor of shape (batch_size,), optional, defaults to None):

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (see input_ids above)

Returns

A MultipleChoiceModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Classification loss.

• logits (torch.FloatTensor of shape (batch_size, num_choices)) – num_choices is the second dimension of the input tensors. (see input_ids above).

Classification scores (before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

MultipleChoiceModelOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelForMultipleChoice
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base')
>>> model = FunnelForMultipleChoice.from_pretrained('funnel-transformer/small-base', return_dict=True)

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."
>>> labels = torch.tensor(0).unsqueeze(0)  # choice0 is correct (according to Wikipedia ;)), batch size 1

>>> encoding = tokenizer([[prompt, prompt], [choice0, choice1]], return_tensors='pt', padding=True)
>>> outputs = model(**{k: v.unsqueeze(0) for k,v in encoding.items()}, labels=labels)  # batch size is 1

>>> # the linear classifier still needs to be trained
>>> loss = outputs.loss
>>> logits = outputs.logits


## FunnelForTokenClassification¶

class transformers.FunnelForTokenClassification(config)[source]

Funnel Transformer Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelForTokenClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

Returns

A TokenClassifierOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Classification loss.

• logits (torch.FloatTensor of shape (batch_size, sequence_length, config.num_labels)) – Classification scores (before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TokenClassifierOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelForTokenClassification
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')
>>> model = FunnelForTokenClassification.from_pretrained('funnel-transformer/small', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0)  # Batch size 1

>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits


class transformers.FunnelForQuestionAnswering(config)[source]

Funnel Transformer Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits). The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (FunnelConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The FunnelForQuestionAnswering forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Inputs:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.FunnelTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None):

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional, defaults to None):

If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional, defaults to None):

If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

return_dict (bool, optional, defaults to None):

If set to True, the model will return a ModelOutput instead of a plain tuple.

start_positions (torch.LongTensor of shape (batch_size,), optional, defaults to None):

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

end_positions (torch.LongTensor of shape (batch_size,), optional, defaults to None):

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

Returns

A QuestionAnsweringModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.

• start_logits (torch.FloatTensor of shape (batch_size, sequence_length,)) – Span-start scores (before SoftMax).

• end_logits (torch.FloatTensor of shape (batch_size, sequence_length,)) – Span-end scores (before SoftMax).

• hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

QuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import FunnelTokenizer, FunnelForQuestionAnswering
>>> import torch

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors='pt')
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([3])

>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> loss = outputs.loss
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits


## TFFunnelBaseModel¶

class transformers.TFFunnelBaseModel(*args, **kwargs)[source]

The base Funnel Transformer Model transformer outputting raw hidden-states without upsampling head (also called decoder) or any task-specific head on top. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs, **kwargs)[source]

The TFFunnelBaseModel forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

Returns

A TFBaseModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• last_hidden_state (tf.Tensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.

• hidden_states (tuple(tf.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFBaseModelOutput or tuple(tf.Tensor)

Example:

>>> from transformers import FunnelTokenizer, TFFunnelBaseModel
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base')
>>> model = TFFunnelBaseModel.from_pretrained('funnel-transformer/small-base')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)

>>> last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple


## TFFunnelModel¶

class transformers.TFFunnelModel(*args, **kwargs)[source]

The bare Funnel Transformer Model transformer outputting raw hidden-states without any specific head on top. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs, **kwargs)[source]

The TFFunnelModel forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

Returns

A TFBaseModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• last_hidden_state (tf.Tensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.

• hidden_states (tuple(tf.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFBaseModelOutput or tuple(tf.Tensor)

Example:

>>> from transformers import FunnelTokenizer, TFFunnelModel
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')
>>> model = TFFunnelModel.from_pretrained('funnel-transformer/small')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)

>>> last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple


## TFFunnelModelForPreTraining¶

class transformers.TFFunnelForPreTraining(*args, **kwargs)[source]

Funnel model with a binary classification head on top as used during pre-training for identifying generated tokens. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(input_ids, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, training=False)[source]

The TFFunnelForPreTraining forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape {0}) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape {0}, optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

Returns

A TFFunnelForPreTrainingOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• logits (tf.Tensor of shape (batch_size, sequence_length)) – Prediction scores of the head (scores for each token before SoftMax).

• hidden_states (tuple(tf.ensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Examples:

>>> from transformers import FunnelTokenizer, TFFunnelForPreTraining
>>> import torch

>>> tokenizer = TFFunnelTokenizer.from_pretrained('funnel-transformer/small')
>>> model = TFFunnelForPreTraining.from_pretrained('funnel-transformer/small')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors= "tf")
>>> logits = model(inputs).logits


Return type

TFFunnelForPreTrainingOutput or tuple(tf.Tensor)

class transformers.TFFunnelForMaskedLM(*args, **kwargs)[source]

Funnel Model with a language modeling head on top. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]

The TFFunnelForMaskedLM forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

• labels (tf.Tensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

Returns

A TFMaskedLMOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (tf.Tensor of shape (1,), optional, returned when labels is provided) – Masked languaged modeling (MLM) loss.

• logits (tf.Tensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

• hidden_states (tuple(tf.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFMaskedLMOutput or tuple(tf.Tensor)

Example::
>>> from transformers import FunnelTokenizer, TFFunnelForMaskedLM
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')

>>> input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True))[None, :]  # Batch size 1

>>> outputs = model(input_ids)
>>> prediction_scores = outputs[0]


## TFFunnelForSequenceClassification¶

class transformers.TFFunnelForSequenceClassification(*args, **kwargs)[source]

Funnel Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]

The TFFunnelForSequenceClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape {0}) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape {0}, optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

• labels (tf.Tensor of shape (batch_size,), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Returns

A TFSequenceClassifierOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (tf.Tensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.

• logits (tf.Tensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).

• hidden_states (tuple(tf.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFSequenceClassifierOutput or tuple(tf.Tensor)

Example:

>>> from transformers import FunnelTokenizer, TFFunnelForSequenceClassification
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base')
>>> model = TFFunnelForSequenceClassification.from_pretrained('funnel-transformer/small-base')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> inputs["labels"] = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1

>>> outputs = model(inputs)
>>> loss, logits = outputs[:2]


## TFFunnelForMultipleChoice¶

class transformers.TFFunnelForMultipleChoice(*args, **kwargs)[source]

Funnel Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]

The TFFunnelForMultipleChoice forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length)) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length), optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length), optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length), optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

• labels (tf.Tensor of shape (batch_size,), optional) – Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices] where num_choices is the size of the second dimension of the input tensors. (see input_ids above)s after the attention softmax, used to compute the weighted average in the self-attention heads.

Returns

A TFMultipleChoiceModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (tf.Tensor of shape (1,), optional, returned when labels is provided) – Classification loss.

• logits (tf.Tensor of shape (batch_size, num_choices)) – num_choices is the second dimension of the input tensors. (see input_ids above).

Classification scores (before SoftMax).

• hidden_states (tuple(tf.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFMultipleChoiceModelOutput or tuple(tf.Tensor)

Example:

>>> from transformers import FunnelTokenizer, TFFunnelForMultipleChoice
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base')
>>> model = TFFunnelForMultipleChoice.from_pretrained('funnel-transformer/small-base')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."

>>> encoding = tokenizer([[prompt, prompt], [choice0, choice1]], return_tensors='tf', padding=True)
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in encoding.items()}
>>> outputs = model(inputs)  # batch size is 1

>>> # the linear classifier still needs to be trained
>>> logits = outputs[0]

property dummy_inputs

Dummy inputs to build the network.

Returns

tf.Tensor with dummy inputs

## TFFunnelForTokenClassification¶

class transformers.TFFunnelForTokenClassification(*args, **kwargs)[source]

Funnel Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]

The TFFunnelForTokenClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape {0}) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape {0}, optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

• labels (tf.Tensor of shape (batch_size, sequence_length), optional) – Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

Returns

A TFTokenClassifierOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (tf.Tensor of shape (1,), optional, returned when labels is provided) – Classification loss.

• logits (tf.Tensor of shape (batch_size, sequence_length, config.num_labels)) – Classification scores (before SoftMax).

• hidden_states (tuple(tf.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFTokenClassifierOutput or tuple(tf.Tensor)

Example:

>>> from transformers import FunnelTokenizer, TFFunnelForTokenClassification
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')
>>> model = TFFunnelForTokenClassification.from_pretrained('funnel-transformer/small')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> input_ids = inputs["input_ids"]
>>> inputs["labels"] = tf.reshape(tf.constant([1] * tf.size(input_ids).numpy()), (-1, tf.size(input_ids))) # Batch size 1

>>> outputs = model(inputs)
>>> loss, scores = outputs[:2]


class transformers.TFFunnelForQuestionAnswering(*args, **kwargs)[source]

Funnel Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits). The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

This model is a tf.keras.Model sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

Note

TF 2.0 models accepts two formats as inputs:

• having all inputs as keyword arguments (like PyTorch models), or

• having all inputs as a list, tuple or dict in the first positional arguments.

This second option is useful when using tf.keras.Model.fit() method which currently requires having all the tensors in the first argument of the model call function: model(inputs).

If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :

• a single Tensor with input_ids only and nothing else: model(inputs_ids)

• a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: model([input_ids, attention_mask]) or model([input_ids, attention_mask, token_type_ids])

• a dictionary with one or several input Tensors associated to the input names given in the docstring: model({'input_ids': input_ids, 'token_type_ids': token_type_ids})

Parameters

config (XxxConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

call(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, start_positions=None, end_positions=None, training=False)[source]

The TFFunnelForQuestionAnswering forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
• input_ids (Numpy array or tf.Tensor of shape {0}) –

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.XxxTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

• attention_mask (Numpy array or tf.Tensor of shape {0}, optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

• token_type_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?

• position_ids (Numpy array or tf.Tensor of shape {0}, optional) –

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?

• head_mask (Numpy array or tf.Tensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.

• inputs_embeds (Numpy array or tf.Tensor of shape (batch_size, sequence_length, embedding_dim), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

• output_attentions (bool, optional) – If set to True, the attentions tensors of all attention layers are returned. See attentions under returned tensors for more detail.

• output_hidden_states (bool, optional) – If set to True, the hidden states of all layers are returned. See hidden_states under returned tensors for more detail.

• return_dict (bool, optional) – If set to True, the model will return a ModelOutput instead of a plain tuple.

• training (boolean, optional, defaults to False) – Whether to activate dropout modules (if set to True) during training or to de-activate them (if set to False) for evaluation.

• start_positions (tf.Tensor of shape (batch_size,), optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

• end_positions (tf.Tensor of shape (batch_size,), optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

Returns

A TFQuestionAnsweringModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of tf.Tensor comprising various elements depending on the configuration (FunnelConfig) and inputs.

• loss (tf.Tensor of shape (1,), optional, returned when labels is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.

• start_logits (tf.Tensor of shape (batch_size, sequence_length,)) – Span-start scores (before SoftMax).

• end_logits (tf.Tensor of shape (batch_size, sequence_length,)) – Span-end scores (before SoftMax).

• hidden_states (tuple(tf.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

• attentions (tuple(tf.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of tf.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

TFQuestionAnsweringModelOutput or tuple(tf.Tensor)

Example:

>>> from transformers import FunnelTokenizer, TFFunnelForQuestionAnswering
>>> import tensorflow as tf

>>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small')