Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## [PhoBert](https://huggingface.co/vinai/phobert-base/tree/main) finetuned version for hate speech detection
|
2 |
+
|
3 |
+
## Dataset
|
4 |
+
- [**VLSP2019**](https://github.com/sonlam1102/vihsd): Hate Speech Detection on Social Networks Dataset
|
5 |
+
- [**ViHSD**](https://vlsp.org.vn/vlsp2019/eval/hsd): Vietnamese Hate Speech Detection dataset
|
6 |
+
|
7 |
+
## Class name
|
8 |
+
- LABEL_0 : **Normal**
|
9 |
+
- LABEL_1 : **OFFENSIVE**
|
10 |
+
- LABEL_2 : **HATE**
|
11 |
+
|
12 |
+
|
13 |
+
## Usage example with **TextClassificationPipeline**
|
14 |
+
```python
|
15 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
|
16 |
+
|
17 |
+
|
18 |
+
model = AutoModelForSequenceClassification.from_pretrained("tsdocode/phobert-finetune-hatespeech", num_labels=3)
|
19 |
+
tokenizer = AutoTokenizer.from_pretrained("tsdocode/phobert-finetune-hatespeech")
|
20 |
+
|
21 |
+
|
22 |
+
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
|
23 |
+
# outputs a list of dicts like [[{'label': 'NEGATIVE', 'score': 0.0001223755971295759}, {'label': 'POSITIVE', 'score': 0.9998776316642761}]]
|
24 |
+
pipe("đồ ngu")
|
25 |
+
|
26 |
+
```
|