TimeRobber
commited on
Commit
•
de2fe6d
1
Parent(s):
314bc11
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,9 @@ license: apache-2.0
|
|
11 |
|
12 |
## Version 1.1
|
13 |
|
14 |
-
[T5 Version 1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md#t511) includes the following improvements compared to the original T5 model
|
|
|
|
|
15 |
|
16 |
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
|
17 |
|
|
|
11 |
|
12 |
## Version 1.1
|
13 |
|
14 |
+
[T5 Version 1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md#t511) includes the following improvements compared to the original T5 model
|
15 |
+
|
16 |
+
- GEGLU activation in feed-forward hidden layer, rather than ReLU - see [here](https://arxiv.org/abs/2002.05202).
|
17 |
|
18 |
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
|
19 |
|