patrickvonplaten
commited on
Commit
•
3d6744b
1
Parent(s):
b8ecf29
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ license: apache-2.0
|
|
11 |
|
12 |
# T5-Efficient-XL (Deep-Narrow version)
|
13 |
|
14 |
-
T5-Efficient-XL is a variation of
|
15 |
It is a *pretrained-only* checkpoint and was released with the
|
16 |
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
17 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
@@ -39,8 +39,8 @@ A sequence of word embeddings is therefore processed sequentially by each transf
|
|
39 |
## Details model architecture
|
40 |
|
41 |
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
42 |
-
It has **2852** million parameters and thus requires **11406
|
43 |
-
or **5703
|
44 |
|
45 |
The *conventional* T5 architectures are summarized as follows:
|
46 |
|
@@ -54,7 +54,7 @@ The *conventional* T5 architectures are summarized as follows:
|
|
54 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
55 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
56 |
|
57 |
-
|
58 |
|
59 |
| Abbreviation | Definition |
|
60 |
| ----| ---- |
|
@@ -99,12 +99,14 @@ You can follow on of the following examples on how to fine-tune the model:
|
|
99 |
|
100 |
## Downstream Performance
|
101 |
|
102 |
-
TODO: Add table
|
|
|
|
|
|
|
|
|
103 |
|
104 |
## More information
|
105 |
|
106 |
We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
|
107 |
As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
|
108 |
-
model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description.
|
109 |
-
|
110 |
-
|
|
|
11 |
|
12 |
# T5-Efficient-XL (Deep-Narrow version)
|
13 |
|
14 |
+
T5-Efficient-XL is a variation of [Google's original T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) following the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
|
15 |
It is a *pretrained-only* checkpoint and was released with the
|
16 |
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
17 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
|
|
39 |
## Details model architecture
|
40 |
|
41 |
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
42 |
+
It has **2852** million parameters and thus requires **11406 MB** of memory in full precision (*fp32*)
|
43 |
+
or **5703 MB** of memory in half precision (*fp16* or *bf16*).
|
44 |
|
45 |
The *conventional* T5 architectures are summarized as follows:
|
46 |
|
|
|
54 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
55 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
56 |
|
57 |
+
whereas the following abbreviations are used:
|
58 |
|
59 |
| Abbreviation | Definition |
|
60 |
| ----| ---- |
|
|
|
99 |
|
100 |
## Downstream Performance
|
101 |
|
102 |
+
TODO: Add table if available
|
103 |
+
|
104 |
+
## Computational Complexity
|
105 |
+
|
106 |
+
TODO: Add table if available
|
107 |
|
108 |
## More information
|
109 |
|
110 |
We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
|
111 |
As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
|
112 |
+
model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description. Those checkpoints are kept [here](https://huggingface.co/NewT5SharedHeadsSharedKeyValues) as they might be ported potentially in the future.
|
|
|
|