khang119966
commited on
Commit
•
59dbe41
1
Parent(s):
01cc525
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ widget:
|
|
11 |
## Overview
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
We trimmed vocabulary size to 50,589 and continually pretrained `google/mt5-base`[1] on a merged 20GB dataset, the training dataset includes:
|
14 |
-
-
|
15 |
- UIT data[2], which is used to pretrain `uitnlp/visobert`[2]
|
16 |
- MC4 ecommerce
|
17 |
- 10.7M comments on VOZ Forum from `tarudesu/VOZ-HSD`[7]
|
|
|
11 |
## Overview
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
We trimmed vocabulary size to 50,589 and continually pretrained `google/mt5-base`[1] on a merged 20GB dataset, the training dataset includes:
|
14 |
+
- Crawled data (100M comments and 15M posts on Facebook)
|
15 |
- UIT data[2], which is used to pretrain `uitnlp/visobert`[2]
|
16 |
- MC4 ecommerce
|
17 |
- 10.7M comments on VOZ Forum from `tarudesu/VOZ-HSD`[7]
|