bigscience
/

mt0-xxl-mt

Text2Text Generation

Transformers

PyTorch

Safetensors

mt5

Eval Results

Model card Files Files and versions Community

TimeRobber commited on Oct 28, 2022

Commit

0c1a4cf

1 Parent(s): 88a280b

Update README.md

Browse files

Files changed (1) hide show

README.md +42 -27

README.md CHANGED Viewed

@@ -105,29 +105,38 @@ language:
 - zu
 datasets:
 - mc4
-- xP3
 ---
-<img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
-Multilingual Text-to-Text Transfer Transformer Zero (mt0)
-Version 1. / 28 Octo 2022
-// TODO @thomasw21
-Current Checkpoint:
-// TODO @thomasw21
-Total seen tokens:
 ---
-# Model Details
 mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:
 Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
-mt5 was then finetuned on xP3 to obtain mt0.
 ## Basics
 *This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
@@ -150,14 +159,11 @@ mt5 was then finetuned on xP3 to obtain mt0.
 **Release Date Estimate:** Friday, 28.October.2022
-// TODO @thomasw21
-**Send Questions to:**
-// TODO @thomas21
-**Cite as:**
-// TODO @thomas21
 **Funded by:**
 </details>
@@ -173,7 +179,7 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 ### Model Architecture and Objective
-* Same architecture as [mt5-xxl](https://huggingface.co/google/mt5-xxl) (see [paper](https://arxiv.org/abs/2010.11934)):
 * Encoder-decoder architecture
@@ -205,15 +211,11 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 ## Training Data
 *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
-It was pretrained on mC4 and then finetuned on xP3
 ### Languages
 // TODO @thomasw21: Copy list from mt5
-### Preprocessing
-// TODO @thomasw21
 ## Speeds, Sizes, Times
@@ -253,10 +255,16 @@ The evaluation supercomputer, [Jean Zay](http://www.idris.fr/eng/jean-zay/), use
 This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
 ```python
-from transformers import AutoModel
 checkpoint = "..." # "checkpoint_1006000" for example
-model = AutoModel.from_pretrained("bigscience/mt0-xxl", revision=checkpoint, torch_dtype="auto", device_map="auto")
 ```
 ## Intended Use
@@ -408,3 +416,10 @@ model = AutoModel.from_pretrained("bigscience/mt0-xxl", revision=checkpoint, tor
 ## Original checkpoints
 The checkpoints in this repo correspond to the HuggingFace Transformers format. We'll provide T5X checkpoints as well.

 - zu
 datasets:
 - mc4
+- bigscience/xP3
 ---
+Multilingual Text-to-Text Transfer Transformer Zero (MT0)
+Version 1. / 28 October 2022
 ---
+# Models
 mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:
 Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
+mt5 was then finetuned on:
+ - [xP3](https://huggingface.co/bigscience/xP3) to obtain [mt0-small](https://huggingface.co/bigscience/mt0-small)/[mt0-base](https://huggingface.co/bigscience/mt0-base)/[mt0-large](https://huggingface.co/bigscience/mt0-large)/[mt0-xl](https://huggingface.co/bigscience/mt0-xl)/[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)
+ - [P3](https://huggingface.co/bigscience/P3) to obtain [mt0-p3-xxl](https://huggingface.co/bigscience/mt0-p3-xxl)
+ - [xP3mt](https://huggingface.co/bigscience/xP3mt) to obtain [mt0-mt-xxl](https://huggingface.co/bigscience/mt5-mt-xxl)
+## Model Flavors
+Multilingual model capable of following user instructions in a variety of languages. Together with our paper [TODO: LINK], we release the following models:
+----
+- [mt0-small](https://huggingface.co/bigscience/mt0-small): 300M parameters multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/bigscience/xP3)
+- [mt0-base](https://huggingface.co/bigscience/mt0-base): 580M parameters multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/bigscience/xP3)
+- [mt0-large](https://huggingface.co/bigscience/mt0-large): 1.2B parameters multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/bigscience/xP3)
+- [mt0-xl](https://huggingface.co/bigscience/mt0-xl): 3.7B parameters multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/bigscience/xP3)
+- [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3)
+----
+- [mt0-p3-xxl](https://huggingface.co/bigscience/mt0-p3-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/bigscience/P3)
+- [mt0-mt-xxl](https://huggingface.co/bigscience/mt5-mt-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3mt](https://huggingface.co/bigscience/xP3mt)
 ## Basics
 *This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
 **Release Date Estimate:** Friday, 28.October.2022
+**Send Questions to:** niklas@huggingface.co
 **Funded by:**
+* The French government.
+* Hugging Face ([website](https://huggingface.co)).
 </details>
 ### Model Architecture and Objective
+* Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
 * Encoder-decoder architecture
 ## Training Data
 *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
+It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
 ### Languages
 // TODO @thomasw21: Copy list from mt5
 ## Speeds, Sizes, Times
 This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
 ```python
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
 checkpoint = "..." # "checkpoint_1006000" for example
+model_name = "bigscience/mt0-xxl"
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name, revision=checkpoint, torch_dtype="auto", device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_name, revision=checkpoint)
+inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
 ```
 ## Intended Use
 ## Original checkpoints
 The checkpoints in this repo correspond to the HuggingFace Transformers format. We'll provide T5X checkpoints as well.
+# Citing MT0
+Please use the following bibtex entry to cite T0:
+```bibtex
+TODO @niklas
+```