uitnlp
/

visobert

Vietnamese Pre-trained Model

Sentiment Analysis

Hate Speech Detection

Emotionn Recognition

Inference Endpoints

Model card Files Files and versions Community

visobert / README.md

mecoaoge2's picture

Update README.md

0c0a1dc 12 months ago

|

No virus

1.8 kB

	---
	pipeline_tag: fill-mask
	widget:
	- text: "đậu xanh rau <mask>"
	---
	# <a name="introduction"></a> ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing (EMNLP 2023 - Main)
	Disclaimer: The paper contains actual comments on social networks that might be construed as abusive, offensive, or obscene.

	ViSoBERT is the state-of-the-art language model for Vietnamese social media tasks:

	- ViSoBERT is the first monolingual MLM (XLM-R architecture) from scratch specifically for Vietnamese social media text.
	- ViSoBERT outperforms previous monolingual, multilingual, and multilingual social media approaches, obtaining new state-of-the-art performances on four downstream Vietnamese social media tasks.

	The general architecture and experimental results of ViSoBERT can be found in our [paper](https://openreview.net/forum?id=gqkg54QNDY):

	@inproceedings{
	anonymous2023plmvismt,
	title={{PLM}4Vi{SMT}: A Pre-Trained Language Model for Vietnamese Social Media Text Processing},
	author={Anonymous},
	booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
	year={2023},
	url={https://openreview.net/forum?id=gqkg54QNDY}
	}


	Please CITE our paper when ViSoBERT is used to help produce published results or is incorporated into other software.

	Installation

	Install `transformers` with pip: `pip install transformers` and `SentencePiece` with pip: `pip install SentencePiece`

	Example usage
	```python
	from transformers import AutoModel,AutoTokenizer
	import torch

	model= AutoModel.from_pretrained('uitnlp/visobert')
	tokenizer = AutoTokenizer.from_pretrained('uitnlp/visobert')

	encoding = tokenizer('dau xanh rau ma',return_tensors='pt')

	with torch.no_grad():
	output = model(**encoding)
	```