metadata
library_name: transformers
tags: []
pipeline_tag: fill-mask
widget:
- text: shop làm ăn như <mask>
- text: cực rẻ <mask> bèo nhèo
- text: hag quảg <mask> kực nét
5CD-AI/viso-twhin-bert-large
Overview
We reduce TwHIN-BERT's vocabulary size to 20k on the UIT dataset and continue pretraining for 10 epochs.
Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):
Model | Avg | Emotion Recognition | Hate Speech Detection | Spam Reviews Detection | Hate Speech Spans Detection | ||||||||
Acc | WF1 | MF1 | Acc | WF1 | MF1 | Acc | WF1 | MF1 | Acc | WF1 | MF1 | ||
viBERT | 78.16 | 61.91 | 61.98 | 59.7 | 85.34 | 85.01 | 62.07 | 89.93 | 89.79 | 76.8 | 90.42 | 90.45 | 84.55 |
vELECTRA | 79.23 | 64.79 | 64.71 | 61.95 | 86.96 | 86.37 | 63.95 | 89.83 | 89.68 | 76.23 | 90.59 | 90.58 | 85.12 |
PhoBERT-Base | 79.3 | 63.49 | 63.36 | 61.41 | 87.12 | 86.81 | 65.01 | 89.83 | 89.75 | 76.18 | 91.32 | 91.38 | 85.92 |
PhoBERT-Large | 79.82 | 64.71 | 64.66 | 62.55 | 87.32 | 86.98 | 65.14 | 90.12 | 90.03 | 76.88 | 91.44 | 91.46 | 86.56 |
ViSoBERT | 81.58 | 68.1 | 68.37 | 65.88 | 88.51 | 88.31 | 68.77 | 90.99 | 90.92 | 79.06 | 91.62 | 91.57 | 86.8 |
visobert-14gb-corpus-pretrained | 82.2 | 68.69 | 68.75 | 66.03 | 88.79 | 88.6 | 69.57 | 91.02 | 90.88 | 77.13 | 93.69 | 93.63 | 89.66 |
viso-twhin-bert-large | 83.87 | 73.45 | 73.14 | 70.99 | 88.86 | 88.8 | 70.81 | 91.6 | 91.47 | 79.07 | 94.08 | 93.96 | 90.22 |
Usage (HuggingFace Transformers)
Install transformers
package:
pip install transformers
Then you can use this model for fill-mask task like this:
from transformers import pipeline
model_path = "5CD-AI/viso-twhin-bert-large"
mask_filler = pipeline("fill-mask", model_path)
mask_filler("đúng nhận sai <mask>", top_k=10)