Transformers documentation

mT5

Transformers

You are viewing v4.42.0 version. A newer version v5.0.0rc0 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

mT5

Overview

The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

The abstract from the paper is the following:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Note: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5 model. Since mT5 was pre-trained unsupervisedly, there’s no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Google has released the following variants:

This model was contributed by patrickvonplaten. The original code can be found here.

Transformers

mT5

Overview

Resources

MT5Config

class transformers.MT5Config

MT5Tokenizer

class transformers.T5Tokenizer

build_inputs_with_special_tokens

convert_tokens_to_string

create_token_type_ids_from_sequences

get_special_tokens_mask

tokenize

MT5TokenizerFast

class transformers.T5TokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

MT5Model

class transformers.MT5Model

deparallelize

forward

parallelize

MT5ForConditionalGeneration

class transformers.MT5ForConditionalGeneration

deparallelize

forward

parallelize

MT5EncoderModel

class transformers.MT5EncoderModel

deparallelize

forward

parallelize

MT5ForSequenceClassification

class transformers.MT5ForSequenceClassification

forward

MT5ForTokenClassification

class transformers.MT5ForTokenClassification

forward

MT5ForQuestionAnswering

class transformers.MT5ForQuestionAnswering

forward

TFMT5Model

class transformers.TFMT5Model

TFMT5ForConditionalGeneration

class transformers.TFMT5ForConditionalGeneration

TFMT5EncoderModel

class transformers.TFMT5EncoderModel

FlaxMT5Model

class transformers.FlaxMT5Model

FlaxMT5ForConditionalGeneration

class transformers.FlaxMT5ForConditionalGeneration

FlaxMT5EncoderModel

class transformers.FlaxMT5EncoderModel