mT5¶

Overview¶

The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

The abstract from the paper is the following:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Note: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task, unlike the original T5 model. Since mT5 was pre-trained unsupervisedly, there’s no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Google has released the following variants:

This model was contributed by patrickvonplaten. The original code can be found here.

MT5Config¶

class transformers.MT5Config(vocab_size=250112, d_model=512, d_kv=64, d_ff=1024, num_layers=8, num_decoder_layers=None, num_heads=6, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, initializer_factor=1.0, feed_forward_proj='gated-gelu', is_encoder_decoder=True, use_cache=True, tokenizer_class='T5Tokenizer', tie_word_embeddings=False, pad_token_id=0, eos_token_id=1, decoder_start_token_id=0, **kwargs)[source]¶

This is the configuration class to store the configuration of a MT5Model or a TFMT5Model. It is used to instantiate a mT5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the mT5 google/mt5-small architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Parameters
  • vocab_size (int, optional, defaults to 250112) – Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling T5Model or TFT5Model.

  • d_model (int, optional, defaults to 512) – Size of the encoder layers and the pooler layer.

  • d_kv (int, optional, defaults to 64) – Size of the key, query, value projections per attention head. d_kv has to be equal to d_model // num_heads.

  • d_ff (int, optional, defaults to 1024) – Size of the intermediate feed forward layer in each T5Block.

  • num_layers (int, optional, defaults to 8) – Number of hidden layers in the Transformer encoder.

  • num_decoder_layers (int, optional) – Number of hidden layers in the Transformer decoder. Will use the same value as num_layers if not set.

  • num_heads (int, optional, defaults to 6) – Number of attention heads for each attention layer in the Transformer encoder.

  • relative_attention_num_buckets (int, optional, defaults to 32) – The number of buckets to use for each attention layer.

  • dropout_rate (float, optional, defaults to 0.1) – The ratio for all dropout layers.

  • layer_norm_eps (float, optional, defaults to 1e-6) – The epsilon used by the layer normalization layers.

  • initializer_factor (float, optional, defaults to 1) – A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing).

  • feed_forward_proj (string, optional, defaults to "gated-gelu") – Type of feed forward layer to be used. Should be one of "relu" or "gated-gelu".

  • use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models).

MT5Tokenizer¶

transformers.MT5Tokenizer¶

alias of transformers.models.t5.tokenization_t5.T5Tokenizer

See T5Tokenizer for all details.

MT5TokenizerFast¶

transformers.MT5TokenizerFast¶

alias of transformers.models.t5.tokenization_t5_fast.T5TokenizerFast

See T5TokenizerFast for all details.

MT5Model¶

class transformers.MT5Model(config: transformers.models.t5.configuration_t5.T5Config)[source]¶

This class overrides T5Model. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import MT5Model, T5Tokenizer
>>> model = MT5Model.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> inputs = tokenizer(article, return_tensors="pt")
>>> with tokenizer.as_target_tokenizer():
...     labels = tokenizer(summary, return_tensors="pt")

>>> outputs = model(input_ids=inputs["input_ids"], decoder_input_ids=labels["input_ids"])
>>> hidden_states = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

MT5ForConditionalGeneration¶

class transformers.MT5ForConditionalGeneration(config)[source]¶

This class overrides T5ForConditionalGeneration. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import MT5ForConditionalGeneration, T5Tokenizer
>>> model = MT5ForConditionalGeneration.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> inputs = tokenizer(article, return_tensors="pt")
>>> with tokenizer.as_target_tokenizer():
...     labels = tokenizer(summary, return_tensors="pt")

>>> outputs = model(**inputs,labels=labels["input_ids"])
>>> loss = outputs.loss
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

MT5EncoderModel¶

class transformers.MT5EncoderModel(config: transformers.models.t5.configuration_t5.T5Config)[source]¶

This class overrides T5EncoderModel. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import MT5EncoderModel, T5Tokenizer
>>> model = MT5EncoderModel.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> input_ids = tokenizer(article, return_tensors="pt").input_ids
>>> outputs = model(input_ids)
>>> hidden_state = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

TFMT5Model¶

class transformers.TFMT5Model(*args, **kwargs)[source]¶

This class overrides TFT5Model. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import TFMT5Model, T5Tokenizer
>>> model = TFMT5Model.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> inputs = tokenizer(article, return_tensors="tf")
>>> with tokenizer.as_target_tokenizer():
...     labels = tokenizer(summary, return_tensors="tf")

>>> outputs = model(input_ids=inputs["input_ids"], decoder_input_ids=labels["input_ids"])
>>> hidden_states = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

TFMT5ForConditionalGeneration¶

class transformers.TFMT5ForConditionalGeneration(*args, **kwargs)[source]¶

This class overrides TFT5ForConditionalGeneration. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import TFMT5ForConditionalGeneration, T5Tokenizer
>>> model = TFMT5ForConditionalGeneration.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> inputs = tokenizer(article, return_tensors="tf")
>>> with tokenizer.as_target_tokenizer():
...     labels = tokenizer(summary, return_tensors="tf")

>>> outputs = model(**inputs,labels=labels["input_ids"])
>>> loss = outputs.loss
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

TFMT5EncoderModel¶

class transformers.TFMT5EncoderModel(*args, **kwargs)[source]¶

This class overrides TFT5EncoderModel. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import TFMT5EncoderModel, T5Tokenizer
>>> model = TFMT5EncoderModel.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> input_ids = tokenizer(article, return_tensors="tf").input_ids
>>> outputs = model(input_ids)
>>> hidden_state = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

FlaxMT5Model¶

class transformers.FlaxMT5Model(config: transformers.models.t5.configuration_t5.T5Config, input_shape: Tuple[int] = (1, 1), seed: int = 0, dtype: numpy.dtype = <class 'jax._src.numpy.lax_numpy.float32'>, **kwargs)[source]¶

This class overrides FlaxT5Model. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import FlaxMT5Model, T5Tokenizer

>>> model = FlaxMT5Model.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")

>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> inputs = tokenizer(article, return_tensors="np")

>>> with tokenizer.as_target_tokenizer():
...     decoder_input_ids = tokenizer(summary, return_tensors="np").input_ids

>>> outputs = model(input_ids=inputs["input_ids"], decoder_input_ids=decoder_input_ids)
>>> hidden_states = outputs.last_hidden_state
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config

FlaxMT5ForConditionalGeneration¶

class transformers.FlaxMT5ForConditionalGeneration(config: transformers.models.t5.configuration_t5.T5Config, input_shape: Tuple[int] = (1, 1), seed: int = 0, dtype: numpy.dtype = <class 'jax._src.numpy.lax_numpy.float32'>, **kwargs)[source]¶

This class overrides FlaxT5ForConditionalGeneration. Please check the superclass for the appropriate documentation alongside usage examples.

Examples:

>>> from transformers import FlaxMT5ForConditionalGeneration, T5Tokenizer

>>> model = FlaxMT5ForConditionalGeneration.from_pretrained("google/mt5-small")
>>> tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")

>>> article = "UN Offizier sagt, dass weiter verhandelt werden muss in Syrien."
>>> summary = "Weiter Verhandlung in Syrien."
>>> inputs = tokenizer(article, return_tensors="np")

>>> with tokenizer.as_target_tokenizer():
...     decoder_input_ids = tokenizer(summary, return_tensors="np").input_ids

>>> outputs = model(**inputs, decoder_input_ids=decoder_input_ids)
>>> logits = outputs.logits
config_class¶

alias of transformers.models.mt5.configuration_mt5.MT5Config