uer commited on
Commit
b140588
1 Parent(s): 72ca583

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -16,8 +16,10 @@ The Text-to-Text Transfer Transformer (T5) leveraged a unified text-to-text form
16
 
17
  | | Link |
18
  | -------- | :-----------------------: |
19
- | **Small** | [**2/128 (Tiny)**][2_128] |
20
- | **Base** | [**4/256 (Mini)**][4_256] |
 
 
21
 
22
  ## How to use
23
 
@@ -101,6 +103,8 @@ python3 scripts/convert_t5_from_uer_to_huggingface.py --input_model_path cluecor
101
  --type t5
102
  ```
103
 
 
 
104
  ### BibTeX entry and citation info
105
 
106
  ```
@@ -113,3 +117,5 @@ python3 scripts/convert_t5_from_uer_to_huggingface.py --input_model_path cluecor
113
  }
114
  ```
115
 
 
 
 
16
 
17
  | | Link |
18
  | -------- | :-----------------------: |
19
+ | **Small** | [**Small**][small] |
20
+ | **Base** | [**Base**][base] |
21
+
22
+ In T5, spans of the input sequence are masked by so-called sentinel token. Each sentinel token represents a unique mask token for the input sequence and should start with <extra_id_0>, <extra_id_1>, … up to <extra_id_199>. However, <extra_id_xxx> is separated into multiple parts in Huggingface's Hosted inference API. Therefore, we replace <extra_id_xxx> with extraxxx in vocabulary and BertTokenizer regards extraxxx as one sentinel token.
23
 
24
  ## How to use
25
 
 
103
  --type t5
104
  ```
105
 
106
+ Notice that
107
+
108
  ### BibTeX entry and citation info
109
 
110
  ```
 
117
  }
118
  ```
119
 
120
+ [small]:https://huggingface.co/uer/t5-small-chinese-cluecorpussmall
121
+ [base]:https://huggingface.co/uer/t5-base-chinese-cluecorpussmall