metadata

library_name: transformers
tags: []
pipeline_tag: fill-mask
widget:
  - text: shop làm ăn như <mask>
  - text: cực rẻ <mask> bèo nhèo
  - text: hag quảg <mask> kực nét

5CD-AI/viso-twhin-bert-large

Overview

We reduce TwHIN-BERT's vocabulary size to 20k on the UIT dataset and continue pretraining for 10 epochs.

Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):

Model	Avg	Emotion Recognition			Hate Speech Detection			Spam Reviews Detection			Hate Speech Spans Detection
Model	Avg	Acc	WF1	MF1	Acc	WF1	MF1	Acc	WF1	MF1	Acc	WF1	MF1
viBERT	78.16	61.91	61.98	59.7	85.34	85.01	62.07	89.93	89.79	76.8	90.42	90.45	84.55
vELECTRA	79.23	64.79	64.71	61.95	86.96	86.37	63.95	89.83	89.68	76.23	90.59	90.58	85.12
PhoBERT-Base	79.3	63.49	63.36	61.41	87.12	86.81	65.01	89.83	89.75	76.18	91.32	91.38	85.92
PhoBERT-Large	79.82	64.71	64.66	62.55	87.32	86.98	65.14	90.12	90.03	76.88	91.44	91.46	86.56
ViSoBERT	81.58	68.1	68.37	65.88	88.51	88.31	68.77	90.99	90.92	79.06	91.62	91.57	86.8
visobert-14gb-corpus-pretrained	82.2	68.69	68.75	66.03	88.79	88.6	69.57	91.02	90.88	77.13	93.69	93.63	89.66
viso-twhin-bert-large	83.87	73.45	73.14	70.99	88.86	88.8	70.81	91.6	91.47	79.07	94.08	93.96	90.22

Usage (HuggingFace Transformers)

Install transformers package:

pip install transformers

Then you can use this model for fill-mask task like this:

from transformers import pipeline

model_path = "5CD-AI/viso-twhin-bert-large"
mask_filler = pipeline("fill-mask", model_path)

mask_filler("đúng nhận sai <mask>", top_k=10)