Instructions to use martynattakit/vuln-analyzer-qwen-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use martynattakit/vuln-analyzer-qwen-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct") model = PeftModel.from_pretrained(base_model, "martynattakit/vuln-analyzer-qwen-lora") - Notebooks
- Google Colab
- Kaggle
vuln-analyzer-qwen-lora
QLoRA fine-tune of Qwen2.5-Coder-7B-Instruct for vulnerability description generation.
Part of the CodeSentinel project.
What this model does
Takes a raw code snippet and produces a structured one-sentence vulnerability description in a consistent format that a downstream classifier (RoBERTa) can reliably classify into a CWE category.
Output format:
This function performs <operation> on <input> without <missing check>,
which may allow an attacker to <impact>.
This model is not a CWE classifier β it's a code-to-description translator. Classification is handled by martynattakit/vuln-classifier-roberta.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "martynattakit/vuln-analyzer-qwen-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
SYSTEM = (
"You are a security analyst. Given a code snippet, produce exactly one "
"structured sentence describing the vulnerability it contains.\n\n"
"Format: \"This function performs <operation> on <input> without "
"<missing check>, which may allow an attacker to <impact>.\"\n\n"
"Be specific about the operation and the missing check. Do not add any other text."
)
code = '''
def get_user(username):
query = "SELECT * FROM users WHERE name = '" + username + "'"
return db.execute(query)
'''
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": f"Analyze this code:\n\n```\n{code}\n```"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=120, do_sample=False)
new_tokens = output[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))
# β This function constructs a SQL query by concatenating user-controlled input
# without parameterization, which may allow an attacker to inject arbitrary
# SQL commands and access or modify the database.
Training
| Parameter | Value |
|---|---|
| Base model | Qwen2.5-Coder-7B-Instruct |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Training samples | 1,596 (BigVul, filtered to MITRE Top 25) |
| Epochs | 3 |
| Learning rate | 2e-4 |
| Batch size | 2 (grad accum 8, effective 16) |
| Hardware | Kaggle T4 x2 |
| Final eval loss | 0.067 |
Training data
BigVul β real-world C/C++ vulnerable functions from open source projects, filtered to MITRE CWE Top 25 classes.
Template-generated target descriptions were used as training outputs β this ensures consistent output format that the downstream RoBERTa classifier can reliably process.
Limitations
- Training data is primarily C/C++ β descriptions for Python/JS/Go may be less precise
- Template-based targets mean descriptions follow a fixed pattern β may not capture all nuances
- This model alone does not classify CWEs β use with
vuln-classifier-robertafor full pipeline - Requires GPU for practical inference (4-bit quantization via bitsandbytes)
Full pipeline
Code input β vuln-analyzer-qwen-lora β structured description
β vuln-classifier-roberta β CWE ID + confidence
See CodeSentinel for the full working demo.
License
MIT
- Downloads last month
- 33