Update README.md
Browse files
README.md
CHANGED
@@ -100,8 +100,11 @@ license: apache-2.0
|
|
100 |
|
101 |
# TL;DR
|
102 |
|
|
|
103 |
> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
|
104 |
|
|
|
|
|
105 |
# Model Details
|
106 |
|
107 |
## Model Description
|
@@ -256,12 +259,14 @@ According to the model card from the [original paper](https://arxiv.org/pdf/2210
|
|
256 |
|
257 |
> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
|
258 |
|
|
|
|
|
259 |
|
260 |
# Evaluation
|
261 |
|
262 |
## Testing Data, Factors & Metrics
|
263 |
|
264 |
-
The
|
265 |
![image.png](https://s3.amazonaws.com/moonup/production/uploads/1666361983550-62441d1d9fdefb55a0b7d12c.png)
|
266 |
For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).
|
267 |
|
@@ -273,7 +278,7 @@ For full results for FLAN-T5-Large, see the [research paper](https://arxiv.org/p
|
|
273 |
|
274 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
275 |
|
276 |
-
- **Hardware Type:** Google Cloud TPU Pods
|
277 |
- **Hours used:** More information needed
|
278 |
- **Cloud Provider:** GCP
|
279 |
- **Compute Region:** More information needed
|
|
|
100 |
|
101 |
# TL;DR
|
102 |
|
103 |
+
If you already know T5, FLAN-T5 is just better at everything! It is a **bigger model (+ XX parameters)** was trained on **more tasks (+ XX )**, **more data (+ XX tokens)** and **more languags (+ XX languages)**. As mentioned in the first few lines of the abstract :
|
104 |
> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
|
105 |
|
106 |
+
**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
|
107 |
+
|
108 |
# Model Details
|
109 |
|
110 |
## Model Description
|
|
|
259 |
|
260 |
> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
|
261 |
|
262 |
+
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
263 |
+
|
264 |
|
265 |
# Evaluation
|
266 |
|
267 |
## Testing Data, Factors & Metrics
|
268 |
|
269 |
+
The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:
|
270 |
![image.png](https://s3.amazonaws.com/moonup/production/uploads/1666361983550-62441d1d9fdefb55a0b7d12c.png)
|
271 |
For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).
|
272 |
|
|
|
278 |
|
279 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
280 |
|
281 |
+
- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4.
|
282 |
- **Hours used:** More information needed
|
283 |
- **Cloud Provider:** GCP
|
284 |
- **Compute Region:** More information needed
|