Instructions to use sujithputta02/cyber-threat-constitutional-slm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sujithputta02/cyber-threat-constitutional-slm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="sujithputta02/cyber-threat-constitutional-slm")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("sujithputta02/cyber-threat-constitutional-slm") model = AutoModelForSequenceClassification.from_pretrained("sujithputta02/cyber-threat-constitutional-slm") - Notebooks
- Google Colab
- Kaggle
π‘οΈ CyberConstituent-SLM
Constitutional AI-Aligned Cybersecurity Threat Classifier
A fine-tuned DistilBERT Small Language Model (SLM) engineered to accurately classify security logs and alerts into specific threat vectors. The training pipeline integrates an Anthropic-inspired Constitutional AI alignment layer, ensuring that raw threat descriptions are sanitized of explicit exploit payloads, SQL injection codes, and unverified geopolitical attribution bias.
π Hugging Face Model Hub: sujithputta02/cyber-threat-constitutional-slm
π Key Specifications
- Base Architecture:
distilbert-base-uncased(67M Parameters) - Task: 6-Class Single-Label Text Classification
- Accuracy: 89% validation accuracy
- Optimization: Fine-tuned on Google Colab T4 GPU using FP16 mixed-precision and Cosine learning rate scheduling.
- Alignment: Aligned under Constitutional AI guidelines to filter out actionable exploit syntax while preserving analytical value.
π― Threat Classification Target Classes
The model classifies text inputs into one of six core cybersecurity threat categories:
| Label ID | Threat Category | Example Indicators |
|---|---|---|
| LABEL_0 | π¦ Malware Attack | Executables running from temp folder, unsigned dll files, keyloggers |
| LABEL_1 | π Ransomware Attack | Cryptographic file encryption, volume shadow copy deletions, ransom demand notes |
| LABEL_2 | π£ Phishing Campaign | Social engineering links, credential harvesting spoofed login portals, deceptive email macros |
| LABEL_3 | π₯ DDoS Attack | Massive SYN/UDP port flooding, network bandwidth exhaustion, botnet requests |
| LABEL_4 | π SQL Injection | SQL command syntax in URL/query parameter, database validation form bypass |
| LABEL_5 | π΅οΈ Man-in-the-Middle | ARP cache poisoning, rogue gate spoofing, SSL handshake intercept attempts |
βοΈ Installation
To deploy or integrate this model on your platform, install the necessary dependencies:
pip install transformers torch
π» Python Usage Examples
1. Simple Inference Pipeline (Quickest Integration)
Use Hugging Face's high-level pipeline to classify custom security logs directly:
from transformers import pipeline
# Load the model from the Hugging Face Hub
classifier = pipeline("text-classification", model="sujithputta02/cyber-threat-constitutional-slm")
# Human-readable labels dictionary
LABEL_MAP = {
"LABEL_0": "π¦ Malware Attack",
"LABEL_1": "π Ransomware Attack",
"LABEL_2": "π£ Phishing Campaign",
"LABEL_3": "π₯ DDoS Attack",
"LABEL_4": "π SQL Injection",
"LABEL_5": "π΅οΈ Man-in-the-Middle Attack"
}
# Sample log to analyze
log = "ALERT: Database validation form bypass query manipulation. SQL syntax identified."
# Classify
result = classifier(log)[0]
confidence = result['score'] * 100
predicted_threat = LABEL_MAP.get(result['label'], result['label'])
print("=" * 60)
print(f"Input Log: {log}")
print(f"Predicted Threat: {predicted_threat}")
print(f"Confidence: {confidence:.2f}%")
print("=" * 60)
2. Manual Tokenizer & Model Execution (Low-level Control)
For systems that require batched operations, model parameter tweaking, or tensor-level output routing:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("sujithputta02/cyber-threat-constitutional-slm")
model = AutoModelForSequenceClassification.from_pretrained("sujithputta02/cyber-threat-constitutional-slm")
# Prepare token streams
text = "Cryptographic file encryption activity detected in user directories. Bulk extension modification."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
# Run prediction
with torch.no_grad():
logits = model(**inputs).logits
# Extract output distribution
probabilities = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
print(f"Class Probability Distribution: {probabilities[0].tolist()}")
print(f"Predicted Class ID: {predicted_class}")
π₯οΈ Web UI Dashboard Setup (Streamlit)
You can launch a live UI using the provided Streamlit app.py script. The UI connects directly to Hugging Face's serverless inference endpoint:
- Install Streamlit:
pip install streamlit requests - Run the application:
streamlit run app.py
(Ensure you have set the HF_API_TOKEN under streamlit's environment variables or secrets for the API queries).
π Reproducibility & Model Info
The full training parameters, evaluation matrices, confusion plots, and learning curves are documented inside the SLM.ipynb training notebook in this repository. The model exports its final parameters directly inside cyber-threat-constitutional-slm/training_config.json.
- Downloads last month
- 46
Model tree for sujithputta02/cyber-threat-constitutional-slm
Base model
distilbert/distilbert-base-uncased