Funnel Transformer

class transformers.FunnelConfig

( vocab_size = 30522 block_sizes = [4, 4, 4] block_repeats = None num_decoder_layers = 2 d_model = 768 n_head = 12 d_head = 64 d_inner = 3072 hidden_act = 'gelu_new' hidden_dropout = 0.1 attention_dropout = 0.1 activation_dropout = 0.0 initializer_range = 0.1 initializer_std = None layer_norm_eps = 1e-09 pooling_type = 'mean' attention_type = 'relative_shift' separate_cls = True truncate_seq = True pool_q_only = True **kwargs )

Parameters

vocab_size (int, optional, defaults to 30522) — Vocabulary size of the Funnel transformer. Defines the number of different tokens that can be represented by the inputs_ids passed when calling FunnelModel or TFFunnelModel.
block_sizes (List[int], optional, defaults to [4, 4, 4]) — The sizes of the blocks used in the model.
block_repeats (List[int], optional) — If passed along, each layer of each block is repeated the number of times indicated.
num_decoder_layers (int, optional, defaults to 2) — The number of layers in the decoder (when not using the base model).
d_model (int, optional, defaults to 768) — Dimensionality of the model’s hidden states.
n_head (int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder.
d_head (int, optional, defaults to 64) — Dimensionality of the model’s heads.
d_inner (int, optional, defaults to 3072) — Inner dimension in the feed-forward blocks.
hidden_act (str or callable, optional, defaults to "gelu_new") — The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
hidden_dropout (float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (float, optional, defaults to 0.1) — The dropout probability for the attention probabilities.
activation_dropout (float, optional, defaults to 0.0) — The dropout probability used between the two layers of the feed-forward blocks.
initializer_range (float, optional, defaults to 0.1) — The upper bound of the uniform initializer for initializing all weight matrices in attention layers.
initializer_std (float, optional) — The standard deviation of the normal initializer for initializing the embedding matrix and the weight of linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for linear layers.
layer_norm_eps (float, optional, defaults to 1e-9) — The epsilon used by the layer normalization layers.
pooling_type (str, optional, defaults to "mean") — Possible values are "mean" or "max". The way pooling is performed at the beginning of each block.
attention_type (str, optional, defaults to "relative_shift") — Possible values are "relative_shift" or "factorized". The former is faster on CPU/GPU while the latter is faster on TPU.
separate_cls (bool, optional, defaults to True) — Whether or not to separate the cls token when applying pooling.
truncate_seq (bool, optional, defaults to False) — When using separate_cls, whether or not to truncate the last token when pooling, to avoid getting a sequence length that is not a multiple of 2.
pool_q_only (bool, optional, defaults to False) — Whether or not to apply the pooling only to the query or to query, key and values for the attention layers.

This is the configuration class to store the configuration of a FunnelModel or a TFBertModel. It is used to instantiate a Funnel Transformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Funnel Transformer funnel-transformer/small architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Transformers

Funnel Transformer

Overview

Documentation resources

FunnelConfig

class transformers.FunnelConfig

FunnelTokenizer

class transformers.FunnelTokenizer

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

save_vocabulary

FunnelTokenizerFast

class transformers.FunnelTokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

Funnel specific outputs

class transformers.models.funnel.modeling_funnel.FunnelForPreTrainingOutput

class transformers.models.funnel.modeling_tf_funnel.TFFunnelForPreTrainingOutput

FunnelBaseModel

class transformers.FunnelBaseModel

forward

FunnelModel

class transformers.FunnelModel

forward

FunnelModelForPreTraining

class transformers.FunnelForPreTraining

forward

FunnelForMaskedLM

class transformers.FunnelForMaskedLM

forward

FunnelForSequenceClassification

class transformers.FunnelForSequenceClassification

forward

FunnelForMultipleChoice

class transformers.FunnelForMultipleChoice

forward

FunnelForTokenClassification

class transformers.FunnelForTokenClassification

forward

FunnelForQuestionAnswering

class transformers.FunnelForQuestionAnswering

forward

TFFunnelBaseModel

class transformers.TFFunnelBaseModel

call

TFFunnelModel

class transformers.TFFunnelModel

call

TFFunnelModelForPreTraining

class transformers.TFFunnelForPreTraining

call

TFFunnelForMaskedLM

class transformers.TFFunnelForMaskedLM

call

TFFunnelForSequenceClassification

class transformers.TFFunnelForSequenceClassification

call

TFFunnelForMultipleChoice

class transformers.TFFunnelForMultipleChoice

call

TFFunnelForTokenClassification

class transformers.TFFunnelForTokenClassification

call

TFFunnelForQuestionAnswering

class transformers.TFFunnelForQuestionAnswering

call