Funnel Transformer¶
Overview¶
The Funnel Transformer model was proposed in the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks (CNN) in computer vision.
The abstract from the paper is the following:
With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading comprehension.
Tips:
Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers. The base model therefore has a final sequence length that is a quarter of the original one. This model can be used directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same sequence length as the input.
The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should be used for
FunnelModel
,FunnelForPreTraining
,FunnelForMaskedLM
,FunnelForTokenClassification
and class:~transformers.FunnelForQuestionAnswering. The second ones should be used forFunnelBaseModel
,FunnelForSequenceClassification
andFunnelForMultipleChoice
.
The original code can be found here.
FunnelConfig¶
-
class
transformers.
FunnelConfig
(vocab_size=30522, block_sizes=[4, 4, 4], block_repeats=None, num_decoder_layers=2, d_model=768, n_head=12, d_head=64, d_inner=3072, hidden_act='gelu_new', hidden_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, max_position_embeddings=512, type_vocab_size=3, initializer_range=0.1, initializer_std=None, layer_norm_eps=1e-09, pooling_type='mean', attention_type='relative_shift', separate_cls=True, truncate_seq=True, pool_q_only=True, **kwargs)[source]¶ This is the configuration class to store the configuration of a
FunnelModel
or aTFBertModel
. It is used to instantiate a Funnel Transformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Funnel Transformer funnel-transformer/small architecture.Configuration objects inherit from
PretrainedConfig
and can be used to control the model outputs. Read the documentation fromPretrainedConfig
for more information.- Parameters
vocab_size (
int
, optional, defaults to 30522) – Vocabulary size of the Funnel transformer. Defines the number of different tokens that can be represented by theinputs_ids
passed when callingFunnelModel
orTFFunnelModel
.block_sizes (
List[int]
, optional, defaults to[4, 4, 4]
) – The sizes of the blocks used in the model.block_repeats (
List[int]
, optional) – If passed along, each layer of each block is repeated the number of times indicated.num_decoder_layers (
int
, optional, defaults to 2) – The number of layers in the decoder (when not using the base model).d_model (
int
, optional, defaults to 768) – Dimensionality of the model’s hidden states.n_head (
int
, optional, defaults to 12) – Number of attention heads for each attention layer in the Transformer encoder.d_head (
int
, optional, defaults to 64) – Dimensionality of the model’s heads.d_inner (
int
, optional, defaults to 3072) – Inner dimension in the feed-forward blocks.hidden_act (
str
orcallable
, optional, defaults to"gelu_new"
) – The non-linear activation function (function or string) in the encoder and pooler. If string,"gelu"
,"relu"
,"silu"
and"gelu_new"
are supported.hidden_dropout (
float
, optional, defaults to 0.1) – The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.attention_dropout (
float
, optional, defaults to 0.1) – The dropout probability for the attention probabilities.activation_dropout (
float
, optional, defaults to 0.0) – The dropout probability used between the two layers of the feed-forward blocks.max_position_embeddings (
int
, optional, defaults to 512) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).type_vocab_size (
int
, optional, defaults to 3) – The vocabulary size of thetoken_type_ids
passed when callingFunnelModel
orTFFunnelModel
.initializer_range (
float
, optional, defaults to 0.1) – The standard deviation of the uniform initializer for initializing all weight matrices in attention layers.initializer_std (
float
, optional) – The standard deviation of the normal initializer for initializing the embedding matrix and the weight of linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for linear layers.layer_norm_eps (
float
, optional, defaults to 1e-9) – The epsilon used by the layer normalization layers.pooling_type (
str
, optional, defaults to"mean"
) – Possible values are"mean"
or"max"
. The way pooling is performed at the beginning of each block.attention_type (
str
, optional, defaults to"relative_shift"
) – Possible values are"relative_shift"
or"factorized"
. The former is faster on CPU/GPU while the latter is faster on TPU.separate_cls (
bool
, optional, defaults toTrue
) – Whether or not to separate the cls token when applying pooling.truncate_seq (
bool
, optional, defaults toFalse
) – When usingseparate_cls
, whether or not to truncate the last token when pooling, to avoid getting a sequence length that is not a multiple of 2.pool_q_only (
bool
, optional, defaults toFalse
) – Whether or not to apply the pooling only to the query or to query, key and values for the attention layers.
FunnelTokenizer¶
-
class
transformers.
FunnelTokenizer
(vocab_file, do_lower_case=True, do_basic_tokenize=True, never_split=None, unk_token='<unk>', sep_token='<sep>', pad_token='<pad>', cls_token='<cls>', mask_token='<mask>', bos_token='<s>', eos_token='</s>', tokenize_chinese_chars=True, strip_accents=None, **kwargs)[source]¶ Construct a Funnel Transformer tokenizer.
FunnelTokenizer
is identical toBertTokenizer
and runs end-to-end tokenization: punctuation splitting and wordpiece.Refer to superclass
BertTokenizer
for usage examples and documentation concerning parameters.-
build_inputs_with_special_tokens
(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) → List[int]¶ Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:
single sequence:
[CLS] X [SEP]
pair of sequences:
[CLS] A [SEP] B [SEP]
- Parameters
token_ids_0 (
List[int]
) – List of IDs to which the special tokens will be added.token_ids_1 (
List[int]
, optional) – Optional second list of IDs for sequence pairs.
- Returns
List of input IDs with the appropriate special tokens.
- Return type
List[int]
-
create_token_type_ids_from_sequences
(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) → List[int][source]¶ Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel Transformer sequence pair mask has the following format:
2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |
If
token_ids_1
isNone
, this method only returns the first portion of the mask (0s).- Parameters
token_ids_0 (
List[int]
) – List of IDs.token_ids_1 (
List[int]
, optional) – Optional second list of IDs for sequence pairs.
- Returns
List of token type IDs according to the given sequence(s).
- Return type
List[int]
-
get_special_tokens_mask
(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False) → List[int]¶ Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer
prepare_for_model
method.- Parameters
token_ids_0 (
List[int]
) – List of IDs.token_ids_1 (
List[int]
, optional) – Optional second list of IDs for sequence pairs.already_has_special_tokens (
bool
, optional, defaults toFalse
) – Whether or not the token list is already formatted with special tokens for the model.
- Returns
A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
- Return type
List[int]
-
save_vocabulary
(save_directory: str, filename_prefix: Optional[str] = None) → Tuple[str]¶ Save only the vocabulary of the tokenizer (vocabulary + added tokens).
This method won’t save the configuration and special token mappings of the tokenizer. Use
_save_pretrained()
to save the whole state of the tokenizer.- Parameters
save_directory (
str
) – The directory in which to save the vocabulary.filename_prefix (
str
, optional) – An optional prefix to add to the named of the saved files.
- Returns
Paths to the files saved.
- Return type
Tuple(str)
-
FunnelTokenizerFast¶
-
class
transformers.
FunnelTokenizerFast
(vocab_file, tokenizer_file=None, do_lower_case=True, unk_token='<unk>', sep_token='<sep>', pad_token='<pad>', cls_token='<cls>', mask_token='<mask>', bos_token='<s>', eos_token='</s>', clean_text=True, tokenize_chinese_chars=True, strip_accents=None, wordpieces_prefix='##', **kwargs)[source]¶ Construct a “fast” Funnel Transformer tokenizer (backed by HuggingFace’s tokenizers library).
FunnelTokenizerFast
is identical toBertTokenizerFast
and runs end-to-end tokenization: punctuation splitting and wordpiece.Refer to superclass
BertTokenizerFast
for usage examples and documentation concerning parameters.-
create_token_type_ids_from_sequences
(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) → List[int][source]¶ Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel Transformer sequence pair mask has the following format:
2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |
If
token_ids_1
isNone
, this method only returns the first portion of the mask (0s).- Parameters
token_ids_0 (
List[int]
) – List of IDs.token_ids_1 (
List[int]
, optional) – Optional second list of IDs for sequence pairs.
- Returns
List of token type IDs according to the given sequence(s).
- Return type
List[int]
-
slow_tokenizer_class
¶ alias of
transformers.tokenization_funnel.FunnelTokenizer
-
Funnel specific outputs¶
-
class
transformers.modeling_funnel.
FunnelForPreTrainingOutput
(loss: Optional[torch.FloatTensor] = None, logits: torch.FloatTensor = None, hidden_states: Optional[Tuple[torch.FloatTensor]] = None, attentions: Optional[Tuple[torch.FloatTensor]] = None)[source]¶ Output type of
FunnelForPreTraining
.- Parameters
loss (optional, returned when
labels
is provided,torch.FloatTensor
of shape(1,)
) – Total loss of the ELECTRA-style objective.logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) – Prediction scores of the head (scores for each token before SoftMax).hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) –Tuple of
torch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) –Tuple of
torch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
-
class
transformers.modeling_tf_funnel.
TFFunnelForPreTrainingOutput
(logits: tensorflow.python.framework.ops.Tensor = None, hidden_states: Optional[Tuple[tensorflow.python.framework.ops.Tensor]] = None, attentions: Optional[Tuple[tensorflow.python.framework.ops.Tensor]] = None)[source]¶ Output type of
FunnelForPreTraining
.- Parameters
logits (
tf.Tensor
of shape(batch_size, sequence_length)
) – Prediction scores of the head (scores for each token before SoftMax).hidden_states (
tuple(tf.ensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) –Tuple of
tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) –Tuple of
tf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
FunnelBaseModel¶
-
class
transformers.
FunnelBaseModel
(config)[source]¶ The base Funnel Transformer Model transformer outputting raw hidden-states without upsampling head (also called decoder) or any task-specific head on top.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelBaseModel
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.
- Returns
A
BaseModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
BaseModelOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelBaseModel >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base') >>> model = FunnelBaseModel.from_pretrained('funnel-transformer/small-base', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_state
FunnelModel¶
-
class
transformers.
FunnelModel
(config)[source]¶ The bare Funnel Transformer Model transformer outputting raw hidden-states without any specific head on top.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelModel
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.
- Returns
A
BaseModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
BaseModelOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelModel >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = FunnelModel.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_state
FunnelModelForPreTraining¶
-
class
transformers.
FunnelForPreTraining
(config)[source]¶ -
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelForPreTraining
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Labels for computing the ELECTRA-style loss. Input should be a sequence of tokens (see
input_ids
docstring) Indices should be in[0, 1]
:0 indicates the token is an original token,
1 indicates the token was replaced.
- Returns
A
FunnelForPreTrainingOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (optional, returned when
labels
is provided,torch.FloatTensor
of shape(1,)
) – Total loss of the ELECTRA-style objective.logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) – Prediction scores of the head (scores for each token before SoftMax).hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Examples:
>>> from transformers import FunnelTokenizer, FunnelForPreTraining >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = FunnelForPreTraining.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors= "pt") >>> logits = model(**inputs).logits
- Return type
FunnelForPreTrainingOutput
ortuple(torch.FloatTensor)
-
FunnelForMaskedLM¶
-
class
transformers.
FunnelForMaskedLM
(config)[source]¶ Funnel Transformer Model with a language modeling head on top.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelForMaskedLM
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) – Labels for computing the masked language modeling loss. Indices should be in[-100, 0, ..., config.vocab_size]
(seeinput_ids
docstring) Tokens with indices set to-100
are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size]
- Returns
A
MaskedLMOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) – Masked language modeling (MLM) loss.logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
MaskedLMOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelForMaskedLM >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = FunnelForMaskedLM.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("The capital of France is <mask>.", return_tensors="pt") >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] >>> outputs = model(**inputs, labels=labels) >>> loss = outputs.loss >>> logits = outputs.logits
FunnelForSequenceClassification¶
-
class
transformers.
FunnelForSequenceClassification
(config)[source]¶ Funnel Transformer Model with a sequence classification/regression head on top (two linear layer on top of the first timestep of the last hidden state) e.g. for GLUE tasks.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelForSequenceClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Ifconfig.num_labels == 1
a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1
a classification loss is computed (Cross-Entropy).
- Returns
A
SequenceClassifierOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) – Classification (or regression if config.num_labels==1) loss.logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) – Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
SequenceClassifierOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelForSequenceClassification >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base') >>> model = FunnelForSequenceClassification.from_pretrained('funnel-transformer/small-base', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 >>> outputs = model(**inputs, labels=labels) >>> loss = outputs.loss >>> logits = outputs.logits
FunnelForMultipleChoice¶
-
class
transformers.
FunnelForMultipleChoice
(config)[source]¶ Funnel Transformer Model with a multiple choice classification head on top (two linear layer on top of the first timestep of the last hidden state, and a softmax) e.g. for RocStories/SWAG tasks.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelForMultipleChoice
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, num_choices, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, num_choices, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, num_choices, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, num_choices, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for computing the multiple choice classification loss. Indices should be in[0, ..., num_choices-1]
wherenum_choices
is the size of the second dimension of the input tensors. (Seeinput_ids
above)
- Returns
A
MultipleChoiceModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
torch.FloatTensor
of shape (1,), optional, returned whenlabels
is provided) – Classification loss.logits (
torch.FloatTensor
of shape(batch_size, num_choices)
) – num_choices is the second dimension of the input tensors. (see input_ids above).Classification scores (before SoftMax).
hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
MultipleChoiceModelOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelForMultipleChoice >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base') >>> model = FunnelForMultipleChoice.from_pretrained('funnel-transformer/small-base', return_dict=True) >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> choice0 = "It is eaten with a fork and a knife." >>> choice1 = "It is eaten while held in the hand." >>> labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)), batch size 1 >>> encoding = tokenizer([[prompt, prompt], [choice0, choice1]], return_tensors='pt', padding=True) >>> outputs = model(**{k: v.unsqueeze(0) for k,v in encoding.items()}, labels=labels) # batch size is 1 >>> # the linear classifier still needs to be trained >>> loss = outputs.loss >>> logits = outputs.logits
FunnelForTokenClassification¶
-
class
transformers.
FunnelForTokenClassification
(config)[source]¶ Funnel Transformer Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelForTokenClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) – Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1]
.
- Returns
A
TokenClassifierOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) – Classification loss.logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.num_labels)
) – Classification scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TokenClassifierOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelForTokenClassification >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = FunnelForTokenClassification.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0) # Batch size 1 >>> outputs = model(**inputs, labels=labels) >>> loss = outputs.loss >>> logits = outputs.logits
FunnelForQuestionAnswering¶
-
class
transformers.
FunnelForQuestionAnswering
(config)[source]¶ Funnel Transformer Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
FunnelConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶ The
FunnelForQuestionAnswering
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.start_positions (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length
). Position outside of the sequence are not taken into account for computing the loss.end_positions (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length
). Position outside of the sequence are not taken into account for computing the loss.
- Returns
A
QuestionAnsweringModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.start_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) – Span-start scores (before SoftMax).end_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) – Span-end scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
QuestionAnsweringModelOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import FunnelTokenizer, FunnelForQuestionAnswering >>> import torch >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = FunnelForQuestionAnswering.from_pretrained('funnel-transformer/small', return_dict=True) >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" >>> inputs = tokenizer(question, text, return_tensors='pt') >>> start_positions = torch.tensor([1]) >>> end_positions = torch.tensor([3]) >>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions) >>> loss = outputs.loss >>> start_scores = outputs.start_logits >>> end_scores = outputs.end_logits
TFFunnelBaseModel¶
-
class
transformers.
TFFunnelBaseModel
(*args, **kwargs)[source]¶ The base Funnel Transformer Model transformer outputting raw hidden-states without upsampling head (also called decoder) or any task-specific head on top.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs, **kwargs)[source]¶ The
TFFunnelBaseModel
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).
- Returns
A
TFBaseModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(tf.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFBaseModelOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelBaseModel >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base') >>> model = TFFunnelBaseModel.from_pretrained('funnel-transformer/small-base', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") >>> outputs = model(inputs) >>> last_hidden_states = outputs.last_hidden_states
TFFunnelModel¶
-
class
transformers.
TFFunnelModel
(*args, **kwargs)[source]¶ The bare Funnel Transformer Model transformer outputting raw hidden-states without any specific head on top.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs, **kwargs)[source]¶ The
TFFunnelModel
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).
- Returns
A
TFBaseModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(tf.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFBaseModelOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelModel >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = TFFunnelModel.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") >>> outputs = model(inputs) >>> last_hidden_states = outputs.last_hidden_states
TFFunnelModelForPreTraining¶
-
class
transformers.
TFFunnelForPreTraining
(*args, **kwargs)[source]¶ Funnel model with a binary classification head on top as used during pre-training for identifying generated tokens.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, training=False, **kwargs)[source]¶ The
TFFunnelForPreTraining
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).
- Returns
A
TFFunnelForPreTrainingOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.logits (
tf.Tensor
of shape(batch_size, sequence_length)
) – Prediction scores of the head (scores for each token before SoftMax).hidden_states (
tuple(tf.ensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Examples:
>>> from transformers import FunnelTokenizer, TFFunnelForPreTraining >>> import torch >>> tokenizer = TFFunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = TFFunnelForPreTraining.from_pretrained('funnel-transformer/small') >>> inputs = tokenizer("Hello, my dog is cute", return_tensors= "tf") >>> logits = model(inputs).logits
- Return type
TFFunnelForPreTrainingOutput
ortuple(tf.Tensor)
TFFunnelForMaskedLM¶
-
class
transformers.
TFFunnelForMaskedLM
(*args, **kwargs)[source]¶ Funnel Model with a language modeling head on top.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]¶ The
TFFunnelForMaskedLM
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).labels (
tf.Tensor
of shape(batch_size, sequence_length)
, optional) – Labels for computing the masked language modeling loss. Indices should be in[-100, 0, ..., config.vocab_size]
(seeinput_ids
docstring) Tokens with indices set to-100
are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size]
- Returns
A
TFMaskedLMOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
tf.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) – Masked language modeling (MLM) loss.logits (
tf.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFMaskedLMOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelForMaskedLM >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = TFFunnelForMaskedLM.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="tf") >>> inputs["labels"] = tokenizer("The capital of France is Paris.", return_tensors="tf")["input_ids"] >>> outputs = model(inputs) >>> loss = outputs.loss >>> logits = outputs.logits
TFFunnelForSequenceClassification¶
-
class
transformers.
TFFunnelForSequenceClassification
(*args, **kwargs)[source]¶ Funnel Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]¶ The
TFFunnelForSequenceClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).labels (
tf.Tensor
of shape(batch_size,)
, optional) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Ifconfig.num_labels == 1
a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1
a classification loss is computed (Cross-Entropy).
- Returns
A
TFSequenceClassifierOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
tf.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) – Classification (or regression if config.num_labels==1) loss.logits (
tf.Tensor
of shape(batch_size, config.num_labels)
) – Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFSequenceClassifierOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelForSequenceClassification >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base') >>> model = TFFunnelForSequenceClassification.from_pretrained('funnel-transformer/small-base', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") >>> inputs["labels"] = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1 >>> outputs = model(inputs) >>> loss = outputs.loss >>> logits = outputs.logits
TFFunnelForMultipleChoice¶
-
class
transformers.
TFFunnelForMultipleChoice
(*args, **kwargs)[source]¶ Funnel Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]¶ The
TFFunnelForMultipleChoice
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, num_choices, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, num_choices, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, num_choices, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, num_choices, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).labels (
tf.Tensor
of shape(batch_size,)
, optional) – Labels for computing the multiple choice classification loss. Indices should be in[0, ..., num_choices]
wherenum_choices
is the size of the second dimension of the input tensors. (Seeinput_ids
above)
- Returns
A
TFMultipleChoiceModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
tf.Tensor
of shape (1,), optional, returned whenlabels
is provided) – Classification loss.logits (
tf.Tensor
of shape(batch_size, num_choices)
) – num_choices is the second dimension of the input tensors. (see input_ids above).Classification scores (before SoftMax).
hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFMultipleChoiceModelOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelForMultipleChoice >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small-base') >>> model = TFFunnelForMultipleChoice.from_pretrained('funnel-transformer/small-base', return_dict=True) >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> choice0 = "It is eaten with a fork and a knife." >>> choice1 = "It is eaten while held in the hand." >>> encoding = tokenizer([[prompt, prompt], [choice0, choice1]], return_tensors='tf', padding=True) >>> inputs = {k: tf.expand_dims(v, 0) for k, v in encoding.items()} >>> outputs = model(inputs) # batch size is 1 >>> # the linear classifier still needs to be trained >>> logits = outputs.logits
TFFunnelForTokenClassification¶
-
class
transformers.
TFFunnelForTokenClassification
(*args, **kwargs)[source]¶ Funnel Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None, training=False)[source]¶ The
TFFunnelForTokenClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).labels (
tf.Tensor
of shape(batch_size, sequence_length)
, optional) – Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1]
.
- Returns
A
TFTokenClassifierOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
tf.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) – Classification loss.logits (
tf.Tensor
of shape(batch_size, sequence_length, config.num_labels)
) – Classification scores (before SoftMax).hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFTokenClassifierOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelForTokenClassification >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = TFFunnelForTokenClassification.from_pretrained('funnel-transformer/small', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") >>> input_ids = inputs["input_ids"] >>> inputs["labels"] = tf.reshape(tf.constant([1] * tf.size(input_ids).numpy()), (-1, tf.size(input_ids))) # Batch size 1 >>> outputs = model(inputs) >>> loss = outputs.loss >>> logits = outputs.logits
TFFunnelForQuestionAnswering¶
-
class
transformers.
TFFunnelForQuestionAnswering
(*args, **kwargs)[source]¶ Funnel Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).
The Funnel Transformer model was proposed in Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
This model inherits from
TFPreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
Note
TF 2.0 models accepts two formats as inputs:
having all inputs as keyword arguments (like PyTorch models), or
having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using
tf.keras.Model.fit()
method which currently requires having all the tensors in the first argument of the model call function:model(inputs)
.If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
a single Tensor with
input_ids
only and nothing else:model(inputs_ids)
a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
model([input_ids, attention_mask])
ormodel([input_ids, attention_mask, token_type_ids])
a dictionary with one or several input Tensors associated to the input names given in the docstring:
model({"input_ids": input_ids, "token_type_ids": token_type_ids})
- Parameters
config (
XxxConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
-
call
(inputs=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None, start_positions=None, end_positions=None, training=False)[source]¶ The
TFFunnelForQuestionAnswering
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
FunnelTokenizer
. Seetransformers.PreTrainedTokenizer.__call__()
andtransformers.PreTrainedTokenizer.encode()
for details.attention_mask (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
token_type_ids (
Numpy array
ortf.Tensor
of shape(batch_size, sequence_length)
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
inputs_embeds (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.training (
bool
, optional, defaults toFalse
) – Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation).start_positions (
tf.Tensor
of shape(batch_size,)
, optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length
). Position outside of the sequence are not taken into account for computing the loss.end_positions (
tf.Tensor
of shape(batch_size,)
, optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length
). Position outside of the sequence are not taken into account for computing the loss.
- Returns
A
TFQuestionAnsweringModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftf.Tensor
comprising various elements depending on the configuration (FunnelConfig
) and inputs.loss (
tf.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.start_logits (
tf.Tensor
of shape(batch_size, sequence_length)
) – Span-start scores (before SoftMax).end_logits (
tf.Tensor
of shape(batch_size, sequence_length)
) – Span-end scores (before SoftMax).hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
TFQuestionAnsweringModelOutput
ortuple(tf.Tensor)
Example:
>>> from transformers import FunnelTokenizer, TFFunnelForQuestionAnswering >>> import tensorflow as tf >>> tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/small') >>> model = TFFunnelForQuestionAnswering.from_pretrained('funnel-transformer/small', return_dict=True) >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" >>> input_dict = tokenizer(question, text, return_tensors='tf') >>> outputs = model(input_dict) >>> start_logits = outputs.start_logits >>> end_logits = outputs.end_logits >>> all_tokens = tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[0]) >>> answer = ' '.join(all_tokens[tf.math.argmax(start_logits, 1)[0] : tf.math.argmax(end_logits, 1)[0]+1])