ybelkada commited on
Commit
8406c6a
1 Parent(s): b974f13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -45
README.md CHANGED
@@ -1,16 +1,89 @@
1
  ---
2
  language:
3
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - summarization
6
  - translation
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  license: apache-2.0
9
  ---
10
 
11
  # Model Card for FLAN-T5 large
12
 
13
- ![model image](https://s3.amazonaws.com/moonup/production/uploads/1666360754614-62441d1d9fdefb55a0b7d12c.png)
14
 
15
  # Table of Contents
16
 
@@ -123,7 +196,7 @@ print(tokenizer.decode(outputs[0]))
123
  <summary> Click to expand </summary>
124
 
125
  ```python
126
- # pip install bistandbytes
127
  from transformers import T5Tokenizer, T5ForConditionalGeneration
128
 
129
  tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
@@ -142,11 +215,11 @@ print(tokenizer.decode(outputs[0]))
142
 
143
  ## Direct Use and Downstream Use
144
 
145
- The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:
146
 
147
- > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
148
 
149
- See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.
150
 
151
  ## Out-of-Scope Use
152
 
@@ -154,58 +227,37 @@ More information needed.
154
 
155
  # Bias, Risks, and Limitations
156
 
157
- More information needed.
158
 
159
- ## Recommendations
160
 
161
- More information needed.
 
 
 
 
 
 
 
 
 
 
162
 
163
  # Training Details
164
 
165
  ## Training Data
166
 
167
- The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.
168
-
169
- The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.
170
- Thereby, the following datasets were being used for (1.) and (2.):
171
-
172
- 1. **Datasets used for Unsupervised denoising objective**:
173
-
174
- - [C4](https://huggingface.co/datasets/c4)
175
- - [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)
176
-
177
-
178
- 2. **Datasets used for Supervised text-to-text language modeling objective**
179
-
180
- - Sentence acceptability judgment
181
- - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
182
- - Sentiment analysis
183
- - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
184
- - Paraphrasing/sentence similarity
185
- - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
186
- - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
187
- - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
188
- - Natural language inference
189
- - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
190
- - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
191
- - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
192
- - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
193
- - Sentence completion
194
- - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
195
- - Word sense disambiguation
196
- - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
197
- - Question answering
198
- - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
199
- - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
200
- - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)
201
 
202
  ## Training Procedure
203
 
204
- In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:
205
 
206
- > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
207
 
208
- The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.
209
 
210
  # Evaluation
211
 
 
1
  ---
2
  language:
3
  - en
4
+ - sp
5
+ - ja
6
+ - pe
7
+ - hi
8
+ - fr
9
+ - ch
10
+ - be
11
+ - gu
12
+ - ge
13
+ - te
14
+ - it
15
+ - ar
16
+ - po
17
+ - ta
18
+ - ma
19
+ - ma
20
+ - or
21
+ - pa
22
+ - po
23
+ - ur
24
+ - ga
25
+ - he
26
+ - ko
27
+ - ca
28
+ - th
29
+ - du
30
+ - in
31
+ - vi
32
+ - bu
33
+ - fi
34
+ - ce
35
+ - la
36
+ - tu
37
+ - ru
38
+ - cr
39
+ - sw
40
+ - yo
41
+ - ku
42
+ - bu
43
+ - ma
44
+ - cz
45
+ - fi
46
+ - so
47
+ - ta
48
+ - sw
49
+ - si
50
+ - ka
51
+ - zh
52
+ - ig
53
+ - xh
54
+ - ro
55
+ - ha
56
+ - es
57
+ - sl
58
+ - li
59
+ - gr
60
+ - ne
61
+ - as
62
+ - no
63
+
64
  tags:
65
  - summarization
66
  - translation
67
 
68
+ datasets:
69
+ - librispeech_asr
70
+ - svakulenk0/qrecc
71
+ - taskmaster2
72
+ - djaym7/wiki_dialog
73
+ - deepmind/code_contests
74
+ - lambada
75
+ - gsm8k
76
+ - aqua_rat
77
+ - esnli
78
+ - quasc
79
+ - qed
80
+
81
  license: apache-2.0
82
  ---
83
 
84
  # Model Card for FLAN-T5 large
85
 
86
+ ![model image](https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png)
87
 
88
  # Table of Contents
89
 
 
196
  <summary> Click to expand </summary>
197
 
198
  ```python
199
+ # pip install bitsandbytes
200
  from transformers import T5Tokenizer, T5ForConditionalGeneration
201
 
202
  tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
 
215
 
216
  ## Direct Use and Downstream Use
217
 
218
+ The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
219
 
220
+ > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
221
 
222
+ See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
223
 
224
  ## Out-of-Scope Use
225
 
 
227
 
228
  # Bias, Risks, and Limitations
229
 
230
+ The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):
231
 
232
+ > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
233
 
234
+ ## Ethical considerations and risks
235
+
236
+ > Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.
237
+
238
+ ## Known Limitations
239
+
240
+ > Flan-T5 has not been tested in real world applications.
241
+
242
+ ## Sensitive Use:
243
+
244
+ > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
245
 
246
  # Training Details
247
 
248
  ## Training Data
249
 
250
+ The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):
251
+
252
+ ![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)
253
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254
 
255
  ## Training Procedure
256
 
257
+ According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
258
 
259
+ > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
260
 
 
261
 
262
  # Evaluation
263