Qwen3-Coder-30B-A3B β€” DCR (Drupal Code Review) QLoRA adapter

A LoRA adapter that specializes Qwen3-Coder-30B-A3B-Instruct for reviewing Drupal 10/11 PHP diffs and emitting structured JSON findings (security, logic, architecture, Drupal-API).

Update (2026-05-30): use v2 instead. A later eval on real Drupal security defects exposed a blind spot this v1 adapter has: it catches only 18.8% of real CVE-grade defects (it over-learned "looks like clean merged Drupal β†’ clean"). The v2 adapter recovers real-defect recall to 56.2% while keeping v1's 100% specificity. The synthetic numbers below are accurate but measured the wrong distribution β€” see the 3-way report (docs/eval/dcr-qlora-v2-report.md).

Why

The base model under-reports issues on real Drupal merge requests (high precision, low recall). This adapter is trained on a hybrid distillation set so the model both catches real Drupal anti-patterns (synthetic positives) and stays quiet on clean code (real merged-MR negatives).

Training data

400 teacher-labeled pairs (distillation_v1): 251 positive / 149 negative.

  • Positives β€” 243 synthetic across 26 Drupal anti-patterns (SQLi, XSS sinks, CSRF-on-GET, broken DI / missing create(), accessCheck() omissions, recursion in presave, deprecated APIs, etc.) + 7 real merge-request bugs.
  • Negatives β€” 149 clean, merged MRs from webform, paragraphs, drupal core, pathauto, commerce, search_api (teacher-verified clean). Teacher: Claude Opus 4.x. Each pair is (diff β†’ JSON verdict + findings).

Usage (with the base model)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
m = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="bfloat16")
m = PeftModel.from_pretrained(m, "bartek-flp/qwen3coder-30b-dcr-lora")

Prompt with the DCR system message (review a diff, output JSON findings only).

Results (A/B vs base, held-out val)

48 held-out pairs the adapter never saw (27 with a defect, 21 clean), temperature 0, served as the same base weights with the LoRA hot-swapped, so only the training differs.

Metric (n=48) Base Tuned
Verdict accuracy 83.3% (40/48) 95.8% (46/48)
Positive recall 81.5% (22/27) 92.6% (25/27)
Negative specificity 85.7% (18/21) 100% (21/21)
Category match 40.7% (11/27) 63.0% (17/27)
Invalid JSON 4/48 0/48

Honest read: fine-tuning mostly bought reliability and calibration, not raw bug-finding. The base model already detects most issues, but on 4 positives it emitted unparseable JSON (often a stray \Drupal\ backslash) and on 3 clean diffs it raised false alarms. The adapter always returns valid JSON, holds 100% specificity, and names categories better. The cost: it missed two low-severity O(nΒ²) array_merge-in-loop bugs the base model caught. A full report with verbatim side-by-side outputs, covering both the wins and the losses, ships in the project repo under docs/eval/.

Limitations

QLoRA on attention projections only; tuned for diff review, not general chat. The synthetic positives teach patterns, not every real-world manifestation. Always keep a human in the loop for security findings.

Downloads last month
43
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bartek-flp/qwen3coder-30b-dcr-lora

Adapter
(46)
this model