TimeRobber commited on
Commit
0c1a4cf
1 Parent(s): 88a280b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -27
README.md CHANGED
@@ -105,29 +105,38 @@ language:
105
  - zu
106
  datasets:
107
  - mc4
108
- - xP3
109
  ---
110
 
111
- <img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
112
-
113
- Multilingual Text-to-Text Transfer Transformer Zero (mt0)
114
- Version 1. / 28 Octo 2022
115
-
116
- // TODO @thomasw21
117
- Current Checkpoint:
118
-
119
- // TODO @thomasw21
120
- Total seen tokens:
121
 
122
  ---
123
 
124
- # Model Details
125
 
126
  mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:
127
 
128
  Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
129
 
130
- mt5 was then finetuned on xP3 to obtain mt0.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
  ## Basics
133
  *This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
@@ -150,14 +159,11 @@ mt5 was then finetuned on xP3 to obtain mt0.
150
 
151
  **Release Date Estimate:** Friday, 28.October.2022
152
 
153
- // TODO @thomasw21
154
- **Send Questions to:**
155
 
156
- // TODO @thomas21
157
- **Cite as:**
158
-
159
- // TODO @thomas21
160
  **Funded by:**
 
 
161
 
162
  </details>
163
 
@@ -173,7 +179,7 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
173
 
174
  ### Model Architecture and Objective
175
 
176
- * Same architecture as [mt5-xxl](https://huggingface.co/google/mt5-xxl) (see [paper](https://arxiv.org/abs/2010.11934)):
177
 
178
  * Encoder-decoder architecture
179
 
@@ -205,15 +211,11 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
205
  ## Training Data
206
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
207
 
208
- It was pretrained on mC4 and then finetuned on xP3
209
 
210
  ### Languages
211
 
212
  // TODO @thomasw21: Copy list from mt5
213
-
214
- ### Preprocessing
215
-
216
- // TODO @thomasw21
217
 
218
  ## Speeds, Sizes, Times
219
 
@@ -253,10 +255,16 @@ The evaluation supercomputer, [Jean Zay](http://www.idris.fr/eng/jean-zay/), use
253
  This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
254
 
255
  ```python
256
- from transformers import AutoModel
257
 
258
  checkpoint = "..." # "checkpoint_1006000" for example
259
- model = AutoModel.from_pretrained("bigscience/mt0-xxl", revision=checkpoint, torch_dtype="auto", device_map="auto")
 
 
 
 
 
 
260
  ```
261
 
262
  ## Intended Use
@@ -408,3 +416,10 @@ model = AutoModel.from_pretrained("bigscience/mt0-xxl", revision=checkpoint, tor
408
  ## Original checkpoints
409
 
410
  The checkpoints in this repo correspond to the HuggingFace Transformers format. We'll provide T5X checkpoints as well.
 
 
 
 
 
 
 
 
105
  - zu
106
  datasets:
107
  - mc4
108
+ - bigscience/xP3
109
  ---
110
 
111
+ Multilingual Text-to-Text Transfer Transformer Zero (MT0)
112
+ Version 1. / 28 October 2022
 
 
 
 
 
 
 
 
113
 
114
  ---
115
 
116
+ # Models
117
 
118
  mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:
119
 
120
  Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
121
 
122
+ mt5 was then finetuned on:
123
+ - [xP3](https://huggingface.co/bigscience/xP3) to obtain [mt0-small](https://huggingface.co/bigscience/mt0-small)/[mt0-base](https://huggingface.co/bigscience/mt0-base)/[mt0-large](https://huggingface.co/bigscience/mt0-large)/[mt0-xl](https://huggingface.co/bigscience/mt0-xl)/[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)
124
+ - [P3](https://huggingface.co/bigscience/P3) to obtain [mt0-p3-xxl](https://huggingface.co/bigscience/mt0-p3-xxl)
125
+ - [xP3mt](https://huggingface.co/bigscience/xP3mt) to obtain [mt0-mt-xxl](https://huggingface.co/bigscience/mt5-mt-xxl)
126
+
127
+ ## Model Flavors
128
+
129
+ Multilingual model capable of following user instructions in a variety of languages. Together with our paper [TODO: LINK], we release the following models:
130
+
131
+ ----
132
+ - [mt0-small](https://huggingface.co/bigscience/mt0-small): 300M parameters multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/bigscience/xP3)
133
+ - [mt0-base](https://huggingface.co/bigscience/mt0-base): 580M parameters multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/bigscience/xP3)
134
+ - [mt0-large](https://huggingface.co/bigscience/mt0-large): 1.2B parameters multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/bigscience/xP3)
135
+ - [mt0-xl](https://huggingface.co/bigscience/mt0-xl): 3.7B parameters multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/bigscience/xP3)
136
+ - [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3)
137
+ ----
138
+ - [mt0-p3-xxl](https://huggingface.co/bigscience/mt0-p3-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/bigscience/P3)
139
+ - [mt0-mt-xxl](https://huggingface.co/bigscience/mt5-mt-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3mt](https://huggingface.co/bigscience/xP3mt)
140
 
141
  ## Basics
142
  *This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
 
159
 
160
  **Release Date Estimate:** Friday, 28.October.2022
161
 
162
+ **Send Questions to:** niklas@huggingface.co
 
163
 
 
 
 
 
164
  **Funded by:**
165
+ * The French government.
166
+ * Hugging Face ([website](https://huggingface.co)).
167
 
168
  </details>
169
 
 
179
 
180
  ### Model Architecture and Objective
181
 
182
+ * Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
183
 
184
  * Encoder-decoder architecture
185
 
 
211
  ## Training Data
212
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
213
 
214
+ It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
215
 
216
  ### Languages
217
 
218
  // TODO @thomasw21: Copy list from mt5
 
 
 
 
219
 
220
  ## Speeds, Sizes, Times
221
 
 
255
  This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
256
 
257
  ```python
258
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
259
 
260
  checkpoint = "..." # "checkpoint_1006000" for example
261
+ model_name = "bigscience/mt0-xxl"
262
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name, revision=checkpoint, torch_dtype="auto", device_map="auto")
263
+ tokenizer = AutoTokenizer.from_pretrained(model_name, revision=checkpoint)
264
+
265
+ inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
266
+ outputs = model.generate(inputs)
267
+ print(tokenizer.decode(outputs[0]))
268
  ```
269
 
270
  ## Intended Use
 
416
  ## Original checkpoints
417
 
418
  The checkpoints in this repo correspond to the HuggingFace Transformers format. We'll provide T5X checkpoints as well.
419
+
420
+ # Citing MT0
421
+
422
+ Please use the following bibtex entry to cite T0:
423
+ ```bibtex
424
+ TODO @niklas
425
+ ```