edaerer
/

promptwaf-command-injection

Text Classification

prompt-injection

command-injection

text-embeddings-inference

Model card Files Files and versions

edaerer commited on 6 days ago

Commit

b694c04

·

verified ·

1 Parent(s): 613ccc0

Update README.md

Files changed (1) hide show

README.md +97 -3

README.md CHANGED Viewed

@@ -1,3 +1,97 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- protectai/deberta-v3-base-prompt-injection-v2
+pipeline_tag: text-classification
+tags:
+- security
+- prompt
+- cyber-security
+- llm-security
+- prompt-injection
+- command-injection
+library_name: transformers
+---
+# Command Injection Detector
+A fine-tuned DeBERTa model for detecting command injection attacks in prompts before they reach an LLM.
+## Overview
+This model is part of [PromptWAF](https://github.com/edaerer/promptwaf) — a multi-layered ML-based Web Application Firewall designed to detect and block prompt injection attacks.
+The model identifies prompts containing shell command execution patterns (`; rm -rf`, `| cat /etc/passwd`, `$(whoami)`, backtick execution, etc.) commonly used in command injection attacks.
+## Model Details
+- **Architecture**: DeBERTa (Base)
+- **Task**: Binary Sequence Classification
+- **Training Data**: Trained on a custom, internally curated command injection dataset
+- **Labels**:
+  - `0` → Safe/Benign
+  - `1` → Command Injection Attack
+## Usage
+### With PromptWAF
+```bash
+# Automatically used in PromptWAF via .env configuration
+CMD_INJECTION_MODEL_DIR=edaerer/promptwaf-command-injection
+```
+### Standalone
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_id = "edaerer/promptwaf-command-injection"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+text = "List files; rm -rf / --no-preserve-root"
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+probabilities = torch.softmax(outputs.logits, dim=-1)
+score = probabilities[0][1].item()  # Malicious score
+print(f"Command Injection Risk: {score:.2%}")
+```
+## Performance
+- **Threshold**: 0.5 (adjustable in PromptWAF)
+- **Input**: Max 256 tokens
+## Integration
+This model is designed to work seamlessly with:
+- **PromptWAF** - The main security orchestrator
+- **HuggingFace Transformers** - For inference
+- Any standard sequence classification pipeline
+## Citation
+```bibtex
+@software{promptwaf2026,
+  author = {Erer, Eda and Odabasi, Talha},
+  title  = {PromptWAF: A Multi-Layered ML Defense for LLM Prompt Security},
+  year   = {2026},
+  url    = {https://github.com/edaerer/promptwaf}
+}
+```
+## License
+Apache License 2.0
+---
+For more information, visit [PromptWAF GitHub Repository](https://github.com/edaerer/promptwaf)