Kaviel: Cyber Threat Intelligence Classification Model

Model Description

Kaviel is a fine-tuned version of roberta-base designed for the classification of text into six categories: Banking Fraud, Terrorist Attack, Life Threat, Online Scams, Information Leakage, and Casual Conversation. This model is specifically trained for use in threat intelligence platforms.

Intended Use

The model is intended to help in automatically classifying textual data into predefined categories to assist in threat detection and management.

Training Data

The model was trained on a custom dataset with the following categories:

Lable 0 Banking Fraud
Lable 1 Terrorist Attack
Lable 2 Life Threat
Lable 3 Online Scams
Lable 4 Information Leakage
Lable 5 Casual Conversation

Training Procedure

The model was fine-tuned using PyTorch Lightning with the following configuration:

Epochs: 3
Batch size: 128
Learning rate: 1.5e-6
Weight decay: 0.001
Warmup ratio: 0.2

Evaluation

The model's performance was evaluated using ROC AUC scores for each category.

How to Use

You can use the model for inference with the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "HiddenKise/Kaviel-threat-text-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example text for prediction
text = "Unauthorized access attempt detected. Verify your account now."

# Tokenize and prepare input
inputs = tokenizer(text, return_tensors="pt")

# Get model predictions
with torch.no_grad():
    outputs = model(**inputs)

# Process outputs (assuming binary classification)
logits = outputs.logits
predictions = torch.sigmoid(logits)

# Print predictions
print(predictions)