File size: 1,908 Bytes
8188807 a967b13 8188807 c791818 8188807 c791818 8188807 c791818 8188807 c791818 8188807 c791818 8188807 c791818 8188807 c791818 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
license: apache-2.0
language:
- en
- de
---
# 🛡️ MLP Cybersecurity Classifier
This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
## 📊 Training Data
The model was trained on a multilingual dataset of cybersecurity and non-cybersecurity news articles. The dataset is publicly available on Zenodo:
🔗 [https://zenodo.org/records/16417939](https://zenodo.org/records/16417939)
## 📦 Model Details
- **Architecture**: `MLPClassifier` with hidden layers `(128, 64)`
- **Embedding model**: [`intfloat/multilingual-e5-large`](https://huggingface.co/intfloat/multilingual-e5-large)
- **Input**: Cleaned article (removed stopwords) or report text
- **Output**: Binary label (e.g., `Cybersecurity`, `Not Cybersecurity`)
- **Languages**: English, German
## 🔧 Usage
```python
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download
import joblib
# 1. Load the embedding model
embedder = SentenceTransformer("intfloat/multilingual-e5-large")
# 2. Load the pretrained MLP classifier from Hugging Face Hub
model_path = hf_hub_download(repo_id="selfconstruct3d/cybersec_classifier", filename="cybersec_classifier.pkl")
model = joblib.load(model_path)
# 3. Example input texts (can be in English or German)
texts = [
"A new ransomware attack has affected critical infrastructure in Germany.",
"The local sports club hosted its annual summer festival this weekend."
]
# 4. Generate embeddings
embeddings = embedder.encode(texts, convert_to_numpy=True, show_progress_bar=False)
# 5. Predict cybersecurity relevance
predictions = model.predict(embeddings)
# 6. Output results
for text, label in zip(texts, predictions):
print(f"Text: {text}\nPrediction: {label}\n")
|