XLM-RoBERTa

class transformers.XLMRobertaConfig

( vocab_size = 30522 hidden_size = 768 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 pad_token_id = 1 bos_token_id = 0 eos_token_id = 2 position_embedding_type = 'absolute' use_cache = True classifier_dropout = None **kwargs )

Parameters

vocab_size (int, optional, defaults to 30522) — Vocabulary size of the XLM-RoBERTa model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling XLMRobertaModel or TFXLMRobertaModel.
hidden_size (int, optional, defaults to 768) — Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (int, optional, defaults to 12) — Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (int, optional, defaults to 3072) — Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder.
hidden_act (str or Callable, optional, defaults to "gelu") — The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
hidden_dropout_prob (float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (float, optional, defaults to 0.1) — The dropout ratio for the attention probabilities.
max_position_embeddings (int, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (int, optional, defaults to 2) — The vocabulary size of the token_type_ids passed when calling XLMRobertaModel or TFXLMRobertaModel.
initializer_range (float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (float, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers.
position_embedding_type (str, optional, defaults to "absolute") — Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).
use_cache (bool, optional, defaults to True) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.
classifier_dropout (float, optional) — The dropout ratio for the classification head.

This is the configuration class to store the configuration of a XLMRobertaModel or a TFXLMRobertaModel. It is used to instantiate a XLM-RoBERTa model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the XLMRoBERTa xlm-roberta-base architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

>>> from transformers import XLMRobertaConfig, XLMRobertaModel

>>> # Initializing a XLM-RoBERTa xlm-roberta-base style configuration
>>> configuration = XLMRobertaConfig()

>>> # Initializing a model (with random weights) from the xlm-roberta-base style configuration
>>> model = XLMRobertaModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

Transformers

XLM-RoBERTa

Overview

XLMRobertaConfig

class transformers.XLMRobertaConfig

XLMRobertaTokenizer

class transformers.XLMRobertaTokenizer

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

save_vocabulary

XLMRobertaTokenizerFast

class transformers.XLMRobertaTokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

XLMRobertaModel

class transformers.XLMRobertaModel

forward

XLMRobertaForCausalLM

class transformers.XLMRobertaForCausalLM

forward

XLMRobertaForMaskedLM

class transformers.XLMRobertaForMaskedLM

forward

XLMRobertaForSequenceClassification

class transformers.XLMRobertaForSequenceClassification

forward

XLMRobertaForMultipleChoice

class transformers.XLMRobertaForMultipleChoice

forward

XLMRobertaForTokenClassification

class transformers.XLMRobertaForTokenClassification

forward

XLMRobertaForQuestionAnswering

class transformers.XLMRobertaForQuestionAnswering

forward

TFXLMRobertaModel

class transformers.TFXLMRobertaModel

call

TFXLMRobertaForMaskedLM

class transformers.TFXLMRobertaForMaskedLM

call

TFXLMRobertaForSequenceClassification

class transformers.TFXLMRobertaForSequenceClassification

call

TFXLMRobertaForMultipleChoice

class transformers.TFXLMRobertaForMultipleChoice

call

TFXLMRobertaForTokenClassification

class transformers.TFXLMRobertaForTokenClassification

call

TFXLMRobertaForQuestionAnswering

class transformers.TFXLMRobertaForQuestionAnswering

call

FlaxXLMRobertaModel

class transformers.FlaxXLMRobertaModel

__call__

FlaxXLMRobertaForMaskedLM

class transformers.FlaxXLMRobertaForMaskedLM

__call__

FlaxXLMRobertaForSequenceClassification

class transformers.FlaxXLMRobertaForSequenceClassification

__call__

FlaxXLMRobertaForMultipleChoice

class transformers.FlaxXLMRobertaForMultipleChoice

__call__

FlaxXLMRobertaForTokenClassification

class transformers.FlaxXLMRobertaForTokenClassification

__call__

FlaxXLMRobertaForQuestionAnswering

class transformers.FlaxXLMRobertaForQuestionAnswering

__call__

call

call

call

call

call

call