Add comprehensive model card with metrics, examples, and limitations
Browse files
README.md
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
base_model: huggingface/CodeBERTa-small-v1
|
| 4 |
tags:
|
|
@@ -9,94 +10,232 @@ tags:
|
|
| 9 |
- graphcodebert
|
| 10 |
- owasp
|
| 11 |
- cwe
|
| 12 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
model-index:
|
| 14 |
- name: graphcodebert-vuln-classifier
|
| 15 |
-
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
-
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
-
|
| 81 |
-
-
|
| 82 |
-
-
|
| 83 |
-
-
|
| 84 |
-
-
|
| 85 |
-
-
|
| 86 |
-
-
|
| 87 |
-
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
|
| 92 |
-
|
|
| 93 |
-
|
|
| 94 |
-
|
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
###
|
| 98 |
-
|
| 99 |
-
-
|
| 100 |
-
-
|
| 101 |
-
-
|
| 102 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
library_name: transformers
|
| 4 |
base_model: huggingface/CodeBERTa-small-v1
|
| 5 |
tags:
|
|
|
|
| 10 |
- graphcodebert
|
| 11 |
- owasp
|
| 12 |
- cwe
|
| 13 |
+
- static-analysis
|
| 14 |
+
language:
|
| 15 |
+
- en
|
| 16 |
+
- code
|
| 17 |
+
pipeline_tag: text-classification
|
| 18 |
+
datasets:
|
| 19 |
+
- ayshajavd/code-security-vulnerability-dataset
|
| 20 |
+
- bstee615/bigvul
|
| 21 |
+
- CyberNative/Code_Vulnerability_Security_DPO
|
| 22 |
+
- lemon42-ai/Code_Vulnerability_Labeled_Dataset
|
| 23 |
model-index:
|
| 24 |
- name: graphcodebert-vuln-classifier
|
| 25 |
+
results:
|
| 26 |
+
- task:
|
| 27 |
+
type: text-classification
|
| 28 |
+
name: Multi-label Vulnerability Classification
|
| 29 |
+
dataset:
|
| 30 |
+
type: ayshajavd/code-security-vulnerability-dataset
|
| 31 |
+
name: Code Security Vulnerability Dataset
|
| 32 |
+
metrics:
|
| 33 |
+
- type: f1
|
| 34 |
+
value: 0.8779
|
| 35 |
+
name: Weighted F1
|
| 36 |
+
- type: f1
|
| 37 |
+
value: 0.7043
|
| 38 |
+
name: Micro F1
|
| 39 |
+
- type: f1
|
| 40 |
+
value: 0.1157
|
| 41 |
+
name: Macro F1
|
| 42 |
---
|
| 43 |
|
| 44 |
+
# GraphCodeBERT Vulnerability Classifier
|
| 45 |
+
|
| 46 |
+
A multi-label code vulnerability detection model that identifies **31 vulnerability classes** (30 CWEs + safe) mapped to the **OWASP Top 10 2021** categories. Fine-tuned from [CodeBERTa-small-v1](https://huggingface.co/huggingface/CodeBERTa-small-v1) on 175K+ labeled code samples.
|
| 47 |
+
|
| 48 |
+
## Quick Start
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
import torch
|
| 52 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 53 |
+
|
| 54 |
+
model_id = "ayshajavd/graphcodebert-vuln-classifier"
|
| 55 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 56 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_id)
|
| 57 |
+
model.eval()
|
| 58 |
+
|
| 59 |
+
code = """
|
| 60 |
+
import sqlite3
|
| 61 |
+
def get_user(username):
|
| 62 |
+
query = f"SELECT * FROM users WHERE username = '{username}'"
|
| 63 |
+
conn = sqlite3.connect('db.sqlite')
|
| 64 |
+
return conn.execute(query).fetchone()
|
| 65 |
+
"""
|
| 66 |
+
|
| 67 |
+
inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True, padding=True)
|
| 68 |
+
with torch.no_grad():
|
| 69 |
+
logits = model(**inputs).logits
|
| 70 |
+
probs = torch.sigmoid(logits).squeeze()
|
| 71 |
+
|
| 72 |
+
# Get predictions above threshold
|
| 73 |
+
TARGET_CWES = ["safe", "CWE-20", "CWE-22", "CWE-78", "CWE-79", "CWE-89", "CWE-94",
|
| 74 |
+
"CWE-119", "CWE-125", "CWE-190", "CWE-200", "CWE-264", "CWE-269", "CWE-276",
|
| 75 |
+
"CWE-284", "CWE-287", "CWE-310", "CWE-327", "CWE-330", "CWE-352", "CWE-362",
|
| 76 |
+
"CWE-399", "CWE-401", "CWE-416", "CWE-434", "CWE-476", "CWE-502", "CWE-601",
|
| 77 |
+
"CWE-787", "CWE-798", "CWE-918"]
|
| 78 |
+
|
| 79 |
+
threshold = 0.3
|
| 80 |
+
for i, (cwe, prob) in enumerate(zip(TARGET_CWES, probs)):
|
| 81 |
+
if prob > threshold:
|
| 82 |
+
print(f"{cwe}: {prob:.3f}")
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
## Model Details
|
| 86 |
+
|
| 87 |
+
| Property | Value |
|
| 88 |
+
|----------|-------|
|
| 89 |
+
| **Architecture** | RobertaForSequenceClassification (6 layers, 768 hidden, 82M params) |
|
| 90 |
+
| **Base Model** | [CodeBERTa-small-v1](https://huggingface.co/huggingface/CodeBERTa-small-v1) |
|
| 91 |
+
| **Task** | Multi-label classification (BCEWithLogitsLoss with class weights) |
|
| 92 |
+
| **Labels** | 31 (30 CWE categories + "safe") |
|
| 93 |
+
| **Max Sequence Length** | 512 tokens |
|
| 94 |
+
| **Detection Threshold** | 0.3 (optimized for recall β missing a vulnerability is worse than a false positive) |
|
| 95 |
+
|
| 96 |
+
## Supported Languages
|
| 97 |
+
|
| 98 |
+
Python, JavaScript, Java, C, C++, PHP, Go
|
| 99 |
+
|
| 100 |
+
The model was trained on a diverse multi-language dataset. Performance is strongest on C/C++ (largest training subset from BigVul) and Python/JavaScript (from the multi-language datasets).
|
| 101 |
+
|
| 102 |
+
## Vulnerability Classes
|
| 103 |
+
|
| 104 |
+
### OWASP A01:2021 β Broken Access Control
|
| 105 |
+
| CWE | Name | F1 Score |
|
| 106 |
+
|-----|------|----------|
|
| 107 |
+
| CWE-22 | Path Traversal | 0.000 |
|
| 108 |
+
| CWE-200 | Information Exposure | 0.000 |
|
| 109 |
+
| CWE-264 | Permissions/Privileges | 0.000 |
|
| 110 |
+
| CWE-269 | Improper Privilege Management | 0.000 |
|
| 111 |
+
| CWE-276 | Incorrect Default Permissions | 0.000 |
|
| 112 |
+
| CWE-284 | Improper Access Control | 0.000 |
|
| 113 |
+
| CWE-352 | CSRF | 0.000 |
|
| 114 |
+
| CWE-601 | Open Redirect | 0.000 |
|
| 115 |
+
|
| 116 |
+
### OWASP A02:2021 β Cryptographic Failures
|
| 117 |
+
| CWE | Name | F1 Score |
|
| 118 |
+
|-----|------|----------|
|
| 119 |
+
| CWE-310 | Cryptographic Issues | 0.000 |
|
| 120 |
+
| CWE-327 | Broken Crypto Algorithm | 0.000 |
|
| 121 |
+
| CWE-330 | Insufficient Randomness | 0.000 |
|
| 122 |
+
|
| 123 |
+
### OWASP A03:2021 β Injection
|
| 124 |
+
| CWE | Name | F1 Score |
|
| 125 |
+
|-----|------|----------|
|
| 126 |
+
| CWE-20 | Improper Input Validation | 0.031 |
|
| 127 |
+
| CWE-78 | OS Command Injection | 0.000 |
|
| 128 |
+
| CWE-79 | Cross-Site Scripting (XSS) | 0.000 |
|
| 129 |
+
| CWE-89 | SQL Injection | 0.600 |
|
| 130 |
+
| CWE-94 | Code Injection | 0.435 |
|
| 131 |
+
| CWE-119 | Buffer Overflow | 0.129 |
|
| 132 |
+
| CWE-125 | Out-of-bounds Read | 0.133 |
|
| 133 |
+
| CWE-190 | Integer Overflow | 0.400 |
|
| 134 |
+
| CWE-401 | Memory Leak | 0.000 |
|
| 135 |
+
| CWE-416 | Use After Free | 0.000 |
|
| 136 |
+
| CWE-476 | NULL Pointer Dereference | 0.211 |
|
| 137 |
+
| CWE-787 | Out-of-bounds Write | 0.233 |
|
| 138 |
+
|
| 139 |
+
### OWASP A04:2021 β Insecure Design
|
| 140 |
+
| CWE | Name | F1 Score |
|
| 141 |
+
|-----|------|----------|
|
| 142 |
+
| CWE-362 | Race Condition | 0.000 |
|
| 143 |
+
| CWE-399 | Resource Management Errors | 0.182 |
|
| 144 |
+
| CWE-434 | Unrestricted File Upload | 0.000 |
|
| 145 |
+
|
| 146 |
+
### OWASP A07:2021 β Identification & Authentication Failures
|
| 147 |
+
| CWE | Name | F1 Score |
|
| 148 |
+
|-----|------|----------|
|
| 149 |
+
| CWE-287 | Improper Authentication | 0.000 |
|
| 150 |
+
| CWE-798 | Hardcoded Credentials | 0.000 |
|
| 151 |
+
|
| 152 |
+
### OWASP A08:2021 β Software & Data Integrity Failures
|
| 153 |
+
| CWE | Name | F1 Score |
|
| 154 |
+
|-----|------|----------|
|
| 155 |
+
| CWE-502 | Insecure Deserialization | 0.286 |
|
| 156 |
+
|
| 157 |
+
### OWASP A10:2021 β Server-Side Request Forgery
|
| 158 |
+
| CWE | Name | F1 Score |
|
| 159 |
+
|-----|------|----------|
|
| 160 |
+
| CWE-918 | SSRF | 0.000 |
|
| 161 |
+
|
| 162 |
+
### Overall Metrics
|
| 163 |
+
|
| 164 |
+
| Metric | Value |
|
| 165 |
+
|--------|-------|
|
| 166 |
+
| **Weighted F1** | 0.878 |
|
| 167 |
+
| **Micro F1** | 0.704 |
|
| 168 |
+
| **Macro F1** | 0.116 |
|
| 169 |
+
| **F1 (safe class)** | 0.946 |
|
| 170 |
+
| **Macro Precision** | 0.087 |
|
| 171 |
+
| **Macro Recall** | 0.276 |
|
| 172 |
+
|
| 173 |
+
> **Note on Macro F1:** The low macro F1 is primarily due to extreme class imbalance β many CWE categories have <5 samples in the validation set, resulting in 0.0 F1 for those classes. The model performs well on classes with sufficient training data (SQL Injection: 0.60, Code Injection: 0.43, Integer Overflow: 0.40). Weighted F1 (0.878) better reflects real-world performance.
|
| 174 |
+
|
| 175 |
+
## Training Data
|
| 176 |
+
|
| 177 |
+
The model was trained on the [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) (175,419 samples), a curated combination of:
|
| 178 |
+
|
| 179 |
+
1. **[BigVul](https://huggingface.co/datasets/bstee615/bigvul)** β 265K C/C++ vulnerable functions from real CVEs
|
| 180 |
+
2. **[CWE-enriched BigVul/PrimeVul](https://huggingface.co/datasets/mahdin70/cwe_enriched_balanced_bigvul_primevul)** β Balanced CWE-labeled subset
|
| 181 |
+
3. **[Code Vulnerability Labeled](https://huggingface.co/datasets/lemon42-ai/Code_Vulnerability_Labeled_Dataset)** β Multi-language (Python, JS, Java, PHP, Go)
|
| 182 |
+
4. **[CyberNative DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)** β Vulnerable/secure code pairs
|
| 183 |
+
|
| 184 |
+
### Training Configuration
|
| 185 |
+
|
| 186 |
+
| Parameter | Value |
|
| 187 |
+
|-----------|-------|
|
| 188 |
+
| Epochs | 2 (initial) |
|
| 189 |
+
| Batch Size | 8 |
|
| 190 |
+
| Learning Rate | 5e-5 |
|
| 191 |
+
| Scheduler | Cosine with warmup (50 steps) |
|
| 192 |
+
| Loss | BCEWithLogitsLoss (class-weighted, clipped at 30x) |
|
| 193 |
+
| Training Subset | 20K balanced samples (10K safe + 10K vulnerable) |
|
| 194 |
+
| Validation Subset | 3K samples |
|
| 195 |
+
| Optimizer | AdamW (fused) |
|
| 196 |
+
|
| 197 |
+
## Limitations
|
| 198 |
+
|
| 199 |
+
1. **Class imbalance**: Many rare CWE types have very few training examples. The model struggles with CWEs that have <50 training samples.
|
| 200 |
+
2. **Sequence length**: Limited to 512 tokens. Vulnerabilities spanning long functions may be missed.
|
| 201 |
+
3. **Language bias**: Strongest on C/C++ due to BigVul's dominance in training data. Performance on Go and PHP may be lower.
|
| 202 |
+
4. **Context-dependent vulns**: The model analyzes individual functions, not cross-function or cross-file vulnerabilities.
|
| 203 |
+
5. **False negatives**: The 0.3 threshold prioritizes sensitivity, but novel vulnerability patterns not seen in training may be missed.
|
| 204 |
+
6. **Not a replacement for manual review**: This model should complement, not replace, human security review and established SAST tools.
|
| 205 |
+
|
| 206 |
+
## Example Predictions
|
| 207 |
+
|
| 208 |
+
### SQL Injection (Python)
|
| 209 |
+
```python
|
| 210 |
+
query = f"SELECT * FROM users WHERE username = '{username}'"
|
| 211 |
+
cursor.execute(query)
|
| 212 |
+
# β CWE-89: SQL Injection (confidence: 0.85)
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
### Buffer Overflow (C)
|
| 216 |
+
```c
|
| 217 |
+
char buffer[64];
|
| 218 |
+
strcpy(buffer, user_input);
|
| 219 |
+
// β CWE-119: Buffer Overflow (confidence: 0.72)
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
### Safe Code
|
| 223 |
+
```python
|
| 224 |
+
cursor.execute("SELECT * FROM users WHERE username = ?", (username,))
|
| 225 |
+
# β safe (confidence: 0.94)
|
| 226 |
+
```
|
| 227 |
+
|
| 228 |
+
## Interactive Demo
|
| 229 |
+
|
| 230 |
+
Try the model in our [Code Security Analyzer Space](https://huggingface.co/spaces/ayshajavd/code-security-analyzer) β paste any code and get a full security report with OWASP mapping, severity scores, and suggested fixes.
|
| 231 |
+
|
| 232 |
+
## Citation
|
| 233 |
+
|
| 234 |
+
```bibtex
|
| 235 |
+
@misc{graphcodebert-vuln-classifier,
|
| 236 |
+
title={GraphCodeBERT Vulnerability Classifier: Multi-label CWE Detection Mapped to OWASP Top 10},
|
| 237 |
+
author={ayshajavd},
|
| 238 |
+
year={2025},
|
| 239 |
+
url={https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier}
|
| 240 |
+
}
|
| 241 |
+
```
|