DistilBERT

class transformers.DistilBertConfig

( vocab_size = 30522 max_position_embeddings = 512 sinusoidal_pos_embds = False n_layers = 6 n_heads = 12 dim = 768 hidden_dim = 3072 dropout = 0.1 attention_dropout = 0.1 activation = 'gelu' initializer_range = 0.02 qa_dropout = 0.1 seq_classif_dropout = 0.2 pad_token_id = 0 **kwargs )

Parameters

vocab_size (int, optional, defaults to 30522) — Vocabulary size of the DistilBERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling DistilBertModel or TFDistilBertModel.
max_position_embeddings (int, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
sinusoidal_pos_embds (boolean, optional, defaults to False) — Whether to use sinusoidal positional embeddings.
n_layers (int, optional, defaults to 6) — Number of hidden layers in the Transformer encoder.
n_heads (int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder.
dim (int, optional, defaults to 768) — Dimensionality of the encoder layers and the pooler layer.
hidden_dim (int, optional, defaults to 3072) — The size of the “intermediate” (often named feed-forward) layer in the Transformer encoder.
dropout (float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (float, optional, defaults to 0.1) — The dropout ratio for the attention probabilities.
activation (str or Callable, optional, defaults to "gelu") — The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
initializer_range (float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
qa_dropout (float, optional, defaults to 0.1) — The dropout probabilities used in the question answering model DistilBertForQuestionAnswering.
seq_classif_dropout (float, optional, defaults to 0.2) — The dropout probabilities used in the sequence classification and the multiple choice model DistilBertForSequenceClassification.

This is the configuration class to store the configuration of a DistilBertModel or a TFDistilBertModel. It is used to instantiate a DistilBERT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the DistilBERT distilbert-base-uncased architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

>>> from transformers import DistilBertConfig, DistilBertModel

>>> # Initializing a DistilBERT configuration
>>> configuration = DistilBertConfig()

>>> # Initializing a model (with random weights) from the configuration
>>> model = DistilBertModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

Transformers

DistilBERT

Overview

Resources

DistilBertConfig

class transformers.DistilBertConfig

DistilBertTokenizer

class transformers.DistilBertTokenizer

DistilBertTokenizerFast

class transformers.DistilBertTokenizerFast

DistilBertModel

class transformers.DistilBertModel

forward

DistilBertForMaskedLM

class transformers.DistilBertForMaskedLM

forward

DistilBertForSequenceClassification

class transformers.DistilBertForSequenceClassification

forward

DistilBertForMultipleChoice

class transformers.DistilBertForMultipleChoice

forward

DistilBertForTokenClassification

class transformers.DistilBertForTokenClassification

forward

DistilBertForQuestionAnswering

class transformers.DistilBertForQuestionAnswering

forward

TFDistilBertModel

class transformers.TFDistilBertModel

call

TFDistilBertForMaskedLM

class transformers.TFDistilBertForMaskedLM

call

TFDistilBertForSequenceClassification

class transformers.TFDistilBertForSequenceClassification

call

TFDistilBertForMultipleChoice

class transformers.TFDistilBertForMultipleChoice

call

TFDistilBertForTokenClassification

class transformers.TFDistilBertForTokenClassification

call

TFDistilBertForQuestionAnswering

class transformers.TFDistilBertForQuestionAnswering

call

FlaxDistilBertModel

class transformers.FlaxDistilBertModel

__call__

FlaxDistilBertForMaskedLM

class transformers.FlaxDistilBertForMaskedLM

__call__

FlaxDistilBertForSequenceClassification

class transformers.FlaxDistilBertForSequenceClassification

__call__

FlaxDistilBertForMultipleChoice

class transformers.FlaxDistilBertForMultipleChoice

__call__

FlaxDistilBertForTokenClassification

class transformers.FlaxDistilBertForTokenClassification

__call__

FlaxDistilBertForQuestionAnswering

class transformers.FlaxDistilBertForQuestionAnswering

__call__

call

call

call

call

call

call