File size: 2,682 Bytes
717480f
a9bed31
 
4196e78
717480f
 
8e0a50f
717480f
8e0a50f
717480f
8d9ce9f
 
 
 
 
 
 
717480f
97d4402
 
04609a3
 
 
 
 
 
 
 
 
 
b0ce383
04609a3
97d4402
 
 
 
 
 
98398aa
 
 
 
97d4402
98398aa
 
 
 
 
 
 
1508623
98398aa
97d4402
 
 
 
0fd40e2
c952daf
3fc3f2f
 
 
 
 
 
 
 
 
 
 
 
 
24171af
20f3517
 
 
 
 
 
 
 
 
 
 
 
 
97d4402
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
language:
- ms
library_name: transformers
---

Safe for Work Classifier Model for Malaysian Data

Current version supports Malay. We are working towards supporting malay, english and indo.

Base Model finetuned from https://huggingface.co/mesolitica/malaysian-mistral-191M-MLM-512 with Malaysian NSFW data.

Data Source: https://huggingface.co/datasets/malaysia-ai/Malaysian-NSFW

Github Repo: https://github.com/malaysia-ai/sfw-classifier

Project Board: https://github.com/orgs/malaysia-ai/projects/6

![Image in a markdown cell](https://github.com/mesolitica/malaysian-llmops/raw/main/e2e.png)

Current Labels Available:

- religion insult
- sexist
- racist
- psychiatric or mental illness
- harassment
- safe for work
- porn
- self-harm
- violence



### How to use

```python
from classifier import MistralForSequenceClassification
from transformers import AutoTokenizer
from transformers import pipeline


model = MistralForSequenceClassification.from_pretrained('malaysia-ai/malaysian-sfw-classifier')
tokenizer = AutoTokenizer.from_pretrained('malaysia-ai/malaysian-sfw-classifier')


pipe = pipeline("text-classification",
                 tokenizer = tokenizer,
                 model=model)

input_str = ["INSERT_INPUT_0", "INSERT_INPUT_1"]
print(pipe(input_str))
```


```
                               precision    recall  f1-score   support

                       racist    0.87619   0.91390   0.89465      1719
              religion insult    0.88533   0.85813   0.87152      3320
psychiatric or mental illness    0.94224   0.87020   0.90479      5624
                       sexist    0.77146   0.82234   0.79609      1486
                   harassment    0.81935   0.87460   0.84608       949
                         porn    0.95047   0.97546   0.96280      1141
                safe for work    0.83471   0.90741   0.86954      3456
                    self-harm    0.81796   0.95906   0.88291       342
                     violence    0.84317   0.78786   0.81457      1433

                     accuracy                        0.87684     19470
                    macro avg    0.86010   0.88544   0.87144     19470
                 weighted avg    0.87960   0.87684   0.87718     19470

```


```
@misc{razak2024adaptingsafeforworkclassifiermalaysian,
      title={Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework}, 
      author={Aisyah Razak and Ariff Nazhan and Kamarul Adha and Wan Adzhar Faiq Adzlan and Mas Aisyah Ahmad and Ammar Azman},
      year={2024},
      eprint={2407.20729},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.20729}, 
}
```