|
--- |
|
language: |
|
- vi |
|
tags: |
|
- classification |
|
widget: |
|
- text: "Xấu vcl" |
|
example_title: "Công kích" |
|
- text: "Đồ ngu" |
|
example_title: "Thù ghét" |
|
- text: "Xin chào chúc một ngày tốt lành" |
|
example_title: "Normal" |
|
|
|
--- |
|
|
|
## [PhoBert](https://huggingface.co/vinai/phobert-base/tree/main) finetuned version for hate speech detection |
|
|
|
## Dataset |
|
- [**VLSP2019**](https://github.com/sonlam1102/vihsd): Hate Speech Detection on Social Networks Dataset |
|
- [**ViHSD**](https://vlsp.org.vn/vlsp2019/eval/hsd): Vietnamese Hate Speech Detection dataset |
|
|
|
## Class name |
|
- LABEL_0 : **Normal** |
|
- LABEL_1 : **OFFENSIVE** |
|
- LABEL_2 : **HATE** |
|
|
|
|
|
## Usage example with **TextClassificationPipeline** |
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("tsdocode/phobert-finetune-hatespeech", num_labels=3) |
|
tokenizer = AutoTokenizer.from_pretrained("tsdocode/phobert-finetune-hatespeech") |
|
|
|
|
|
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True) |
|
# outputs a list of dicts like [[{'label': 'NEGATIVE', 'score': 0.0001223755971295759}, {'label': 'POSITIVE', 'score': 0.9998776316642761}]] |
|
pipe("đồ ngu") |
|
|
|
``` |