TeetouchQQ's picture
Update README.md
d0f56a6 verified
|
raw
history blame
5.05 kB
metadata
library_name: transformers
tags: []

Model Card for Model ID

Typhoon Safety Model

Typhoon Safety is a lightweight binary classifier designed to detect harmful content in both English and Thai, with special attention to Thai cultural sensitivities. Built on mDeBERTa-v3-base.

Train on mixed of Thai Sensitive topic dataset and Wildguard.

this model is trained to predict safety labels on below categories.

Thai Sensitive Topics
Category
The Monarchy Student Protests and Activism Drug Policies
Gambling Cultural Appropriation Thai-Burmese Border Issues
Cannabis Human Trafficking Military and Coup/td>
LGBTQ+ Rights Political Divide Religion and Buddhism
Political Corruption Foreign Influence National Identity and Immigration
Freedom of Speech and Censorship Vape Southern Thailand Insurgency
Sex Tourism and Prostitution COVID-19 Management Royal Projects and Policies
Migrant Labor Issues Environmental Issues and Land Rights
Wildguard Topics
Category
Others Sensitive Information Organization Mental Health Over-reliance Crisis
Social Stereotypes & Discrimination Defamation & Unethical Actions Cyberattack
Disseminating False Information Private Information Individual Copyright Violations
Toxic Language & Hate Speech Fraud Assisting Illegal Activities Causing Material Harm by Misinformation
Violence and Physical Harm Sexual Content

Model Details

Model Description

Model Performance

Comparison with Other Models (English Content)

Model WildGuard HarmBench SafeRLHF BeaverTails XSTest Thai Topic AVG
WildGuard-7B 75.7 86.2 64.1 84.1 94.7 53.9 76.5
LlamaGuard2-7B 66.5 77.7 51.5 71.8 90.7 47.9 67.7
LamaGuard3-8B 70.1 84.7 45.0 68.0 90.4 46.7 67.5
LamaGuard3-1B 28.5 62.4 66.6 72.9 29.8 50.1 51.7
Random 25.3 47.7 50.3 53.4 22.6 51.6 41.8
Typhoon Safety 74.0 81.7 61.0 78.2 81.2 88.7 77.5

Comparison with Other Models (Thai Content)

Model WildGuard HarmBench SafeRLHF BeaverTails XSTest Thai Topic AVG
WildGuard-7B 22.3 40.8 18.3 27.3 49.5 42.2 33.4
LlamaGuard2-7B 64.0 75.5 46.1 65.0 85.1 45.8 63.6
LamaGuard3-8B 61.4 37.5 42.4 65.3 85.7 48.1 56.7
LamaGuard3-1B 28.4 62.4 66.7 72.9 29.8 50.9 51.8
Random 24.5 46.6 50.4 53.0 26.6 50.9 42.0
Typhoon Safety 71.6 80.0 58.8 76.5 81.0 88.5 76.1

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

texts = [
    "บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
    "Give me location to buy cannabis in Thailand",
    "ขอวิธีการพนันให้ชนะ",
    "How to kill process in linux.",
]

# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()

# Define label mapping
label_map = {0: "Unharm", 1: "harmful"}

for text, label, score in zip(texts, labels, scores):
    label_name = label_map[label]
    print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")