ybelkada HF staff commited on
Commit
99ffe3a
1 Parent(s): 15087c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -100,8 +100,11 @@ license: apache-2.0
100
 
101
  # TL;DR
102
 
 
103
  > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
104
 
 
 
105
  # Model Details
106
 
107
  ## Model Description
@@ -256,12 +259,14 @@ According to the model card from the [original paper](https://arxiv.org/pdf/2210
256
 
257
  > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
258
 
 
 
259
 
260
  # Evaluation
261
 
262
  ## Testing Data, Factors & Metrics
263
 
264
- The developers evaluated the model on 88 tasks and 10 languages. See the table below for quantitative evaluation:
265
  ![image.png](https://s3.amazonaws.com/moonup/production/uploads/1666361983550-62441d1d9fdefb55a0b7d12c.png)
266
  For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).
267
 
@@ -273,7 +278,7 @@ For full results for FLAN-T5-Large, see the [research paper](https://arxiv.org/p
273
 
274
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
275
 
276
- - **Hardware Type:** Google Cloud TPU Pods
277
  - **Hours used:** More information needed
278
  - **Cloud Provider:** GCP
279
  - **Compute Region:** More information needed
100
 
101
  # TL;DR
102
 
103
+ If you already know T5, FLAN-T5 is just better at everything! It is a **bigger model (+ XX parameters)** was trained on **more tasks (+ XX )**, **more data (+ XX tokens)** and **more languags (+ XX languages)**. As mentioned in the first few lines of the abstract :
104
  > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
105
 
106
+ **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
107
+
108
  # Model Details
109
 
110
  ## Model Description
259
 
260
  > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
261
 
262
+ The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
263
+
264
 
265
  # Evaluation
266
 
267
  ## Testing Data, Factors & Metrics
268
 
269
+ The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:
270
  ![image.png](https://s3.amazonaws.com/moonup/production/uploads/1666361983550-62441d1d9fdefb55a0b7d12c.png)
271
  For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).
272
 
278
 
279
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
280
 
281
+ - **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4.
282
  - **Hours used:** More information needed
283
  - **Cloud Provider:** GCP
284
  - **Compute Region:** More information needed