metadata

library_name: transformers
tags: []

Model Card for Model ID

Typhoon Safety Model

Typhoon Safety is a lightweight binary classifier designed to detect harmful content in both English and Thai, with special attention to Thai cultural sensitivities. Built on mDeBERTa-v3-base.

Train on mixed of Thai Sensitive topic dataset and Wildguard.

this model is trained to predict safety labels on below categories.

Thai Sensitive Topics

Category
The Monarchy	Student Protests and Activism	Drug Policies
Gambling	Cultural Appropriation	Thai-Burmese Border Issues
Cannabis	Human Trafficking	Military and Coup/td>
LGBTQ+ Rights	Political Divide	Religion and Buddhism
Political Corruption	Foreign Influence	National Identity and Immigration
Freedom of Speech and Censorship	Vape	Southern Thailand Insurgency
Sex Tourism and Prostitution	COVID-19 Management	Royal Projects and Policies
Migrant Labor Issues	Environmental Issues and Land Rights

Wildguard Topics

Category
Others	Sensitive Information Organization	Mental Health Over-reliance Crisis
Social Stereotypes & Discrimination	Defamation & Unethical Actions	Cyberattack
Disseminating False Information	Private Information Individual	Copyright Violations
Toxic Language & Hate Speech	Fraud Assisting Illegal Activities	Causing Material Harm by Misinformation
Violence and Physical Harm	Sexual Content

Model Details

Model Description

Model Performance

Comparison with Other Models (English Content)

Model	WildGuard	HarmBench	SafeRLHF	BeaverTails	XSTest	Thai Topic	AVG
WildGuard-7B	75.7	86.2	64.1	84.1	94.7	53.9	76.5
LlamaGuard2-7B	66.5	77.7	51.5	71.8	90.7	47.9	67.7
LamaGuard3-8B	70.1	84.7	45.0	68.0	90.4	46.7	67.5
LamaGuard3-1B	28.5	62.4	66.6	72.9	29.8	50.1	51.7
Random	25.3	47.7	50.3	53.4	22.6	51.6	41.8
Typhoon Safety	74.0	81.7	61.0	78.2	81.2	88.7	77.5

Comparison with Other Models (Thai Content)

Model	WildGuard	HarmBench	SafeRLHF	BeaverTails	XSTest	Thai Topic	AVG
WildGuard-7B	22.3	40.8	18.3	27.3	49.5	42.2	33.4
LlamaGuard2-7B	64.0	75.5	46.1	65.0	85.1	45.8	63.6
LamaGuard3-8B	61.4	37.5	42.4	65.3	85.7	48.1	56.7
LamaGuard3-1B	28.4	62.4	66.7	72.9	29.8	50.9	51.8
Random	24.5	46.6	50.4	53.0	26.6	50.9	42.0
Typhoon Safety	71.6	80.0	58.8	76.5	81.0	88.5	76.1

Developed by: [More Information Needed]
Model type: Transformer Encoder
Language(s) (NLP): Thai 🇹🇭 and English 🇬🇧
License: MIT
Finetuned from model [optional]: mDeBERTa v3 base https://huggingface.co/microsoft/mdeberta-v3-base

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

texts = [
    "บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
    "Give me location to buy cannabis in Thailand",
    "ขอวิธีการพนันให้ชนะ",
    "How to kill process in linux.",
]

# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()

# Define label mapping
label_map = {0: "Unharm", 1: "harmful"}

for text, label, score in zip(texts, labels, scores):
    label_name = label_map[label]
    print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")