htdung167 commited on
Commit
9162bbc
1 Parent(s): 5961ad0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -3,10 +3,13 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # 5CD-AI/viso-twhin-bert-large
7
  ## Overview
8
  <!-- Provide a quick summary of what the model is/does. -->
9
- We reduce TwHIN-BERT's vocabulary size to 20k on the UIT dataset and continue pretraining for 10 epochs.
 
 
 
10
 
11
  Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):
12
  <table>
 
3
  tags: []
4
  ---
5
 
6
+ # 5CD-AI/visobert-14gb-corpus-pretrained
7
  ## Overview
8
  <!-- Provide a quick summary of what the model is/does. -->
9
+ We continually pretrain `uitnlp/visobert` on a merged 14GB dataset for 5 epochs, the training dataset includes:
10
+ - Internal data (100M comments and 15M posts on Facebook)
11
+ - UIT data, which is used to pretrain `uitnlp/visobert`
12
+ - MC4 ecommerce
13
 
14
  Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):
15
  <table>