edaerer commited on
Commit
b694c04
·
verified ·
1 Parent(s): 613ccc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - protectai/deberta-v3-base-prompt-injection-v2
7
+ pipeline_tag: text-classification
8
+ tags:
9
+ - security
10
+ - prompt
11
+ - cyber-security
12
+ - llm-security
13
+ - prompt-injection
14
+ - command-injection
15
+ library_name: transformers
16
+ ---
17
+
18
+ # Command Injection Detector
19
+
20
+ A fine-tuned DeBERTa model for detecting command injection attacks in prompts before they reach an LLM.
21
+
22
+ ## Overview
23
+
24
+ This model is part of [PromptWAF](https://github.com/edaerer/promptwaf) — a multi-layered ML-based Web Application Firewall designed to detect and block prompt injection attacks.
25
+
26
+ The model identifies prompts containing shell command execution patterns (`; rm -rf`, `| cat /etc/passwd`, `$(whoami)`, backtick execution, etc.) commonly used in command injection attacks.
27
+
28
+ ## Model Details
29
+
30
+ - **Architecture**: DeBERTa (Base)
31
+ - **Task**: Binary Sequence Classification
32
+ - **Training Data**: Trained on a custom, internally curated command injection dataset
33
+ - **Labels**:
34
+ - `0` → Safe/Benign
35
+ - `1` → Command Injection Attack
36
+
37
+ ## Usage
38
+
39
+ ### With PromptWAF
40
+
41
+ ```bash
42
+ # Automatically used in PromptWAF via .env configuration
43
+ CMD_INJECTION_MODEL_DIR=edaerer/promptwaf-command-injection
44
+ ```
45
+
46
+ ### Standalone
47
+
48
+ ```python
49
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
50
+ import torch
51
+
52
+ model_id = "edaerer/promptwaf-command-injection"
53
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
54
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
55
+
56
+ text = "List files; rm -rf / --no-preserve-root"
57
+ inputs = tokenizer(text, return_tensors="pt")
58
+
59
+ with torch.no_grad():
60
+ outputs = model(**inputs)
61
+
62
+ probabilities = torch.softmax(outputs.logits, dim=-1)
63
+ score = probabilities[0][1].item() # Malicious score
64
+
65
+ print(f"Command Injection Risk: {score:.2%}")
66
+ ```
67
+
68
+ ## Performance
69
+
70
+ - **Threshold**: 0.5 (adjustable in PromptWAF)
71
+ - **Input**: Max 256 tokens
72
+
73
+ ## Integration
74
+
75
+ This model is designed to work seamlessly with:
76
+ - **PromptWAF** - The main security orchestrator
77
+ - **HuggingFace Transformers** - For inference
78
+ - Any standard sequence classification pipeline
79
+
80
+ ## Citation
81
+
82
+ ```bibtex
83
+ @software{promptwaf2026,
84
+ author = {Erer, Eda and Odabasi, Talha},
85
+ title = {PromptWAF: A Multi-Layered ML Defense for LLM Prompt Security},
86
+ year = {2026},
87
+ url = {https://github.com/edaerer/promptwaf}
88
+ }
89
+ ```
90
+
91
+ ## License
92
+
93
+ Apache License 2.0
94
+
95
+ ---
96
+
97
+ For more information, visit [PromptWAF GitHub Repository](https://github.com/edaerer/promptwaf)