MT5¶

Overview¶

The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

The abstract from the paper is the following:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints

The original code can be found here.

MT5Config¶

class transformers.MT5Config(vocab_size=250112, d_model=512, d_kv=64, d_ff=1024, num_layers=8, num_decoder_layers=None, num_heads=6, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, initializer_factor=1.0, feed_forward_proj='gated-gelu', is_encoder_decoder=True, use_cache=True, tokenizer_class='T5Tokenizer', tie_word_embeddings=False, pad_token_id=0, eos_token_id=1, decoder_start_token_id=0, **kwargs)[source]¶

This is the configuration class to store the configuration of a MT5Model or a TFMT5Model. It is used to instantiate a mT5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the mT5 google/mt5-small architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Parameters
  • vocab_size (int, optional, defaults to 32128) – Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling T5Model or TFT5Model.

  • d_model (int, optional, defaults to 512) – Size of the encoder layers and the pooler layer.

  • d_kv (int, optional, defaults to 64) – Size of the key, query, value projections per attention head. d_kv has to be equal to d_model // num_heads.

  • d_ff (int, optional, defaults to 1024) – Size of the intermediate feed forward layer in each T5Block.

  • num_layers (int, optional, defaults to 8) – Number of hidden layers in the Transformer encoder.

  • num_decoder_layers (int, optional) – Number of hidden layers in the Transformer decoder. Will use the same value as num_layers if not set.

  • num_heads (int, optional, defaults to 6) – Number of attention heads for each attention layer in the Transformer encoder.

  • relative_attention_num_buckets (int, optional, defaults to 32) – The number of buckets to use for each attention layer.

  • dropout_rate (float, optional, defaults to 0.1) – The ratio for all dropout layers.

  • layer_norm_eps (float, optional, defaults to 1e-6) – The epsilon used by the layer normalization layers.

  • initializer_factor (float, optional, defaults to 1) – A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing).

  • feed_forward_proj (string, optional, defaults to "gated-gelu") – Type of feed forward layer to be used. Should be one of "relu" or "gated-gelu".

  • use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models).

MT5Model¶

class transformers.MT5Model(config: transformers.models.t5.configuration_t5.T5Config)[source]¶

This class overrides T5Model. Please check the superclass for the appropriate documentation alongside usage examples.

Examples::
>>> from transformers import MT5Model, T5Tokenizer
>>> model = MT5Model.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="pt")
>>> outputs = model(input_ids=batch.input_ids, decoder_input_ids=batch.labels)
>>> hidden_states = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

MT5ForConditionalGeneration¶

class transformers.MT5ForConditionalGeneration(config)[source]¶

This class overrides T5ForConditionalGeneration. Please check the superclass for the appropriate documentation alongside usage examples.

Examples::
>>> from transformers import MT5ForConditionalGeneration, T5Tokenizer
>>> model = MT5ForConditionalGeneration.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="pt")
>>> outputs = model(**batch)
>>> loss = outputs.loss
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

TFMT5Model¶

class transformers.TFMT5Model(*args, **kwargs)[source]¶

This class overrides TFT5Model. Please check the superclass for the appropriate documentation alongside usage examples.

Examples::
>>> from transformers import TFMT5Model, T5Tokenizer
>>> model = TFMT5Model.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="tf")
>>> batch["decoder_input_ids"] = batch["labels"]
>>> del batch["labels"]
>>> outputs = model(batch)
>>> hidden_states = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

TFMT5ForConditionalGeneration¶

class transformers.TFMT5ForConditionalGeneration(*args, **kwargs)[source]¶

This class overrides TFT5ForConditionalGeneration. Please check the superclass for the appropriate documentation alongside usage examples.

Examples::
>>> from transformers import TFMT5ForConditionalGeneration, T5Tokenizer
>>> model = TFMT5ForConditionalGeneration.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors="tf")
>>> outputs = model(batch)
>>> loss = outputs.loss
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config