MT5¶
Overview¶
The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
The abstract from the paper is the following:
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints
The original code can be found here.
MT5Config¶
-
class
transformers.
MT5Config
(vocab_size=250112, d_model=512, d_kv=64, d_ff=1024, num_layers=8, num_decoder_layers=None, num_heads=6, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, initializer_factor=1.0, feed_forward_proj='gated-gelu', is_encoder_decoder=True, use_cache=True, tokenizer_class='T5Tokenizer', tie_word_embeddings=False, pad_token_id=0, eos_token_id=1, decoder_start_token_id=0, **kwargs)[source]¶ This is the configuration class to store the configuration of a
MT5Model
or aTFMT5Model
. It is used to instantiate a mT5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the mT5 google/mt5-small architecture.Configuration objects inherit from
PretrainedConfig
and can be used to control the model outputs. Read the documentation fromPretrainedConfig
for more information.- Parameters
vocab_size (
int
, optional, defaults to 32128) – Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by theinputs_ids
passed when callingT5Model
orTFT5Model
.d_model (
int
, optional, defaults to 512) – Size of the encoder layers and the pooler layer.d_kv (
int
, optional, defaults to 64) – Size of the key, query, value projections per attention head.d_kv
has to be equal tod_model // num_heads
.d_ff (
int
, optional, defaults to 1024) – Size of the intermediate feed forward layer in eachT5Block
.num_layers (
int
, optional, defaults to 8) – Number of hidden layers in the Transformer encoder.num_decoder_layers (
int
, optional) – Number of hidden layers in the Transformer decoder. Will use the same value asnum_layers
if not set.num_heads (
int
, optional, defaults to 6) – Number of attention heads for each attention layer in the Transformer encoder.relative_attention_num_buckets (
int
, optional, defaults to 32) – The number of buckets to use for each attention layer.dropout_rate (
float
, optional, defaults to 0.1) – The ratio for all dropout layers.layer_norm_eps (
float
, optional, defaults to 1e-6) – The epsilon used by the layer normalization layers.initializer_factor (
float
, optional, defaults to 1) – A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing).feed_forward_proj (
string
, optional, defaults to"gated-gelu"
) – Type of feed forward layer to be used. Should be one of"relu"
or"gated-gelu"
.use_cache (
bool
, optional, defaults toTrue
) – Whether or not the model should return the last key/values attentions (not used by all models).
MT5Model¶
-
class
transformers.
MT5Model
(config: transformers.models.t5.configuration_t5.T5Config)[source]¶ This class overrides
T5Model
. Please check the superclass for the appropriate documentation alongside usage examples.- Examples::
>>> from transformers import MT5Model, T5Tokenizer >>> model = MT5Model.from_pretrained("google/mt5-small") >>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small") >>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien." >>> summary = "Weiter Verhandlung in Syrien." >>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="pt") >>> outputs = model(input_ids=batch.input_ids, decoder_input_ids=batch.labels) >>> hidden_states = outputs.last_hidden_state
-
config_class
¶ alias of
transformers.models.mt5.configuration_mt5.MT5Config
MT5ForConditionalGeneration¶
-
class
transformers.
MT5ForConditionalGeneration
(config)[source]¶ This class overrides
T5ForConditionalGeneration
. Please check the superclass for the appropriate documentation alongside usage examples.- Examples::
>>> from transformers import MT5ForConditionalGeneration, T5Tokenizer >>> model = MT5ForConditionalGeneration.from_pretrained("google/mt5-small") >>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small") >>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien." >>> summary = "Weiter Verhandlung in Syrien." >>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="pt") >>> outputs = model(**batch) >>> loss = outputs.loss
-
config_class
¶ alias of
transformers.models.mt5.configuration_mt5.MT5Config
TFMT5Model¶
-
class
transformers.
TFMT5Model
(*args, **kwargs)[source]¶ This class overrides
TFT5Model
. Please check the superclass for the appropriate documentation alongside usage examples.- Examples::
>>> from transformers import TFMT5Model, T5Tokenizer >>> model = TFMT5Model.from_pretrained("google/mt5-small") >>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small") >>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien." >>> summary = "Weiter Verhandlung in Syrien." >>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="tf") >>> batch["decoder_input_ids"] = batch["labels"] >>> del batch["labels"] >>> outputs = model(batch) >>> hidden_states = outputs.last_hidden_state
-
config_class
¶ alias of
transformers.models.mt5.configuration_mt5.MT5Config
TFMT5ForConditionalGeneration¶
-
class
transformers.
TFMT5ForConditionalGeneration
(*args, **kwargs)[source]¶ This class overrides
TFT5ForConditionalGeneration
. Please check the superclass for the appropriate documentation alongside usage examples.- Examples::
>>> from transformers import TFMT5ForConditionalGeneration, T5Tokenizer >>> model = TFMT5ForConditionalGeneration.from_pretrained("google/mt5-small") >>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small") >>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien." >>> summary = "Weiter Verhandlung in Syrien." >>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="tf") >>> outputs = model(batch) >>> loss = outputs.loss
-
config_class
¶ alias of
transformers.models.mt5.configuration_mt5.MT5Config