File size: 1,262 Bytes
8545d36
 
 
 
 
1109dce
457ef57
1109dce
457ef57
1109dce
aa7110d
ba30492
8545d36
 
 
e45760a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
language:
- vi
tags:
- classification
widget:
- text: "Xấu vcl"
  example_title: "Công kích"
- text: "Đồ ngu"
  example_title: "Thù ghét"
- text: "Xin chào chúc một ngày tốt lành"
  example_title: "Normal"

---

## [PhoBert](https://huggingface.co/vinai/phobert-base/tree/main) finetuned version for hate speech detection

## Dataset
- [**VLSP2019**](https://github.com/sonlam1102/vihsd): Hate Speech Detection on Social Networks Dataset
- [**ViHSD**](https://vlsp.org.vn/vlsp2019/eval/hsd): Vietnamese Hate Speech Detection dataset

## Class name
- LABEL_0 : **Normal**
- LABEL_1 : **OFFENSIVE**
- LABEL_2 : **HATE**


## Usage example with **TextClassificationPipeline**
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline


model = AutoModelForSequenceClassification.from_pretrained("tsdocode/phobert-finetune-hatespeech", num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("tsdocode/phobert-finetune-hatespeech")


pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
# outputs a list of dicts like [[{'label': 'NEGATIVE', 'score': 0.0001223755971295759},  {'label': 'POSITIVE', 'score': 0.9998776316642761}]]
pipe("đồ ngu")

```