patrickvonplaten commited on
Commit
3d6744b
1 Parent(s): b8ecf29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -11,7 +11,7 @@ license: apache-2.0
11
 
12
  # T5-Efficient-XL (Deep-Narrow version)
13
 
14
- T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
15
  It is a *pretrained-only* checkpoint and was released with the
16
  paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
17
  by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
@@ -39,8 +39,8 @@ A sequence of word embeddings is therefore processed sequentially by each transf
39
  ## Details model architecture
40
 
41
  This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
42
- It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
43
- or **5703** MB of memory in half precision (*fp16* or *bf16*).
44
 
45
  The *conventional* T5 architectures are summarized as follows:
46
 
@@ -54,7 +54,7 @@ The *conventional* T5 architectures are summarized as follows:
54
  | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
55
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
56
 
57
- , whereas the following abbreviations are used:
58
 
59
  | Abbreviation | Definition |
60
  | ----| ---- |
@@ -99,12 +99,14 @@ You can follow on of the following examples on how to fine-tune the model:
99
 
100
  ## Downstream Performance
101
 
102
- TODO: Add table of full downstream performances if possible.
 
 
 
 
103
 
104
  ## More information
105
 
106
  We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
107
  As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
108
- model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description.
109
-
110
-
 
11
 
12
  # T5-Efficient-XL (Deep-Narrow version)
13
 
14
+ T5-Efficient-XL is a variation of [Google's original T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) following the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
15
  It is a *pretrained-only* checkpoint and was released with the
16
  paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
17
  by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
 
39
  ## Details model architecture
40
 
41
  This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
42
+ It has **2852** million parameters and thus requires **11406 MB** of memory in full precision (*fp32*)
43
+ or **5703 MB** of memory in half precision (*fp16* or *bf16*).
44
 
45
  The *conventional* T5 architectures are summarized as follows:
46
 
 
54
  | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
55
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
56
 
57
+ whereas the following abbreviations are used:
58
 
59
  | Abbreviation | Definition |
60
  | ----| ---- |
 
99
 
100
  ## Downstream Performance
101
 
102
+ TODO: Add table if available
103
+
104
+ ## Computational Complexity
105
+
106
+ TODO: Add table if available
107
 
108
  ## More information
109
 
110
  We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
111
  As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
112
+ model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description. Those checkpoints are kept [here](https://huggingface.co/NewT5SharedHeadsSharedKeyValues) as they might be ported potentially in the future.