Confli-mBERT: ModernBERT-based Terrorist Attack Type Classification
Confli-mBERT is a fine-tuned ModernBERT model for multi-label classification of terrorist attack types, trained on the Global Terrorism Database (GTD). This model leverages ModernBERT's enhanced architecture and extends it to the domain of conflict and terrorism analysis.
Model description
Confli-mBERT was fine-tuned on the Global Terrorism Database (GTD), using descriptions of terrorist attacks from 1970-2016 as training data and 2017 onwards as a testing dataset. The model is designed to classify terrorist incidents into nine possible attack types:
- Hostage Taking (Kidnapping)
- Armed Assault
- Bombing/Explosion
- Unknown
- Assassination
- Hijacking
- Unarmed Assault
- Facility/Infrastructure Attack
- Hostage Taking (Barricade Incident)
The model handles multi-label classification, as a single terrorist incident can involve multiple attack types simultaneously.
Intended uses & limitations
This model is intended for research and analysis of terrorism and political violence. It can assist analysts, researchers, and policy makers in understanding and categorizing terrorist attacks based on their descriptions.
Limitations:
- The model was trained on historical data and may not reflect new and emerging tactics.
- Performance varies across attack types, with better results for more common categories.
- Text descriptions must be in English.
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("shreyasmeher/Confli-mBERT")
model = AutoModelForSequenceClassification.from_pretrained("shreyasmeher/Confli-mBERT")
# Example text
text = "Assailants detonated a bomb near a government building and opened fire on civilians."
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
# Get probabilities
probs = torch.sigmoid(outputs.logits)[0]
id2label = model.config.id2label
# Print predictions above threshold
threshold = 0.5
for i, prob in enumerate(probs):
if prob > threshold:
print(f"{id2label[str(i)]}: {prob:.4f}")
You can also use the pipeline for simpler inference:
from transformers import pipeline
classifier = pipeline("text-classification", model="shreyasmeher/Confli-mBERT")
result = classifier("Militants kidnapped three foreign aid workers near the border.")
print(result)
Training data
The model was trained on the Global Terrorism Database (GTD), a comprehensive open-source database of terrorist events around the world from 1970 through the present. For each GTD incident, information is available on the date and location of the incident, the weapons used, the nature of the target, the number of casualties, and the group or individual responsible.
Training data included incidents up to 2016, while evaluation was performed on incidents from 2017 onwards.
Training procedure
The model was fine-tuned from ModernBERT-base with the following parameters:
- Learning rate: 3e-5
- Batch size: 16
- Number of epochs: 5
- Optimizer: AdamW
- Loss function: BCEWithLogitsLoss (for multi-label classification)
Training progression
Training showed consistent improvement across epochs:
Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
---|---|---|---|---|
1 | 0.1627 | 0.1420 | 0.7059 | 0.7552 |
2 | 0.1557 | 0.1246 | 0.7218 | 0.7723 |
3 | 0.1471 | 0.1045 | 0.7721 | 0.8200 |
4 | 0.1380 | 0.1060 | 0.7766 | 0.8211 |
5 | 0.1426 | 0.1083 | 0.7809 | 0.8221 |
Limitations and bias
This model inherits limitations from both the ModernBERT architecture and the training data:
- The Global Terrorism Database is based on news reports, which may be biased in their coverage across regions and types of events.
- The model may perform better on attack types that are more common in the training data, as evidenced by the F1 score disparities.
- Cultural and regional context may affect model performance, as terrorism tactics and reporting vary across regions.
- Limited data on rare attack types indicates potential difficulties with class imbalance.
Citation
If you use this model in your research, please cite:
@misc{meher2025conflibert,
author = {Meher, Shreyas},
title = {Confli-mBERT: ModernBERT-based Terrorist Attack Type Classification},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/shreyasmeher/Confli-mBERT}}
}
Acknowledgements
- This research was supported by NSF award 2311142
- This work used Delta at NCSA / University of Illinois through allocation CIS220162 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by NSF grants 2138259, 2138286, 2138307, 2137603, and 2138296
- This model was built upon the ModernBERT architecture and trained using data from the Global Terrorism Database (GTD) maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START). We extend our gratitude to the data curators and the original model developers.
- Downloads last month
- 8
Model tree for shreyasmeher/Confli-mBERT
Base model
answerdotai/ModernBERT-base