Instructions to use bartek-flp/qwen3coder-30b-dcr-lora-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use bartek-flp/qwen3coder-30b-dcr-lora-v4 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Coder-30B-A3B-Instruct") model = PeftModel.from_pretrained(base_model, "bartek-flp/qwen3coder-30b-dcr-lora-v4") - Notebooks
- Google Colab
- Kaggle
Qwen3-Coder-30B-A3B β DCR (Drupal Code Review) QLoRA adapter, v4
Fourth-round LoRA adapter for Qwen3-Coder-30B-A3B-Instruct, reviewing Drupal PHP diffs and emitting structured JSON findings. v4 = v3's data plus 41 rows mined from real Drupal bug-fix merge requests (the fix diff is inverted so it re-introduces the bug; a teacher labels what's wrong). Non-security logic and Drupal-API mistakes, not CVEs.
This is the recommended adapter β the one we deploy. v4 is the first round where real, varied data moved the result instead of landing in the noise. On a new non-security eval set it caught 8 of 13 real bugs versus v3's 4, while holding 100% specificity on the security set. Full honest report (including what v4 still misses and a note on run-to-run noise) ships in the project repo under
docs/eval/dcr-qlora-v4-report.md.
Results (this session, base + v3 + v4 scored together, temperature 0)
Security set (n=32, 16 pos / 16 neg)
| Metric | Base | v3 | v4 |
|---|---|---|---|
| Verdict accuracy | 71.9% (23/32) | 84.4% (27/32) | 90.6% (29/32) |
| Positive recall | 87.5% (14/16) | 75.0% (12/16) | 81.2% (13/16) |
| Negative specificity | 56.2% (9/16) | 93.8% (15/16) | 100% (16/16) |
| Category match | 56.2% | 50.0% | 56.2% |
| Invalid JSON | 0/32 | 0/32 | 0/32 |
Non-security set (n=26, 13 pos / 13 neg)
| Metric | Base | v3 | v4 |
|---|---|---|---|
| Verdict accuracy | 65.4% (17/26) | 61.5% (16/26) | 76.9% (20/26) |
| Positive recall | 69.2% (9/13) | 30.8% (4/13) | 61.5% (8/13) |
| Negative specificity | 61.5% (8/13) | 92.3% (12/13) | 92.3% (12/13) |
| Category match | 53.8% | 23.1% | 46.2% |
| Invalid JSON | 0/26 | 0/26 | 0/26 |
The non-security recall jump (v3 4/13 β v4 8/13) and the non-security verdict gain (16/26 β 20/26) are four-case moves, beyond the run-to-run noise. The security specificity and verdict lead are smaller margins, but nothing regressed and they point the same way.
Training data
v3's 526 rows + 41 real bug-fix positive/negative rows (mined from merged Drupal MRs, inverted, teacher-labeled). QLoRA r=16 on q/k/v/o, batch 4 + grad-accum 4 + grad-ckpt, MAX_LEN=2048, 3 epochs, lr 2e-4. Trained on one H100, 114 steps, ~106 min.
Limitations
Real-defect recall is still ~60% on non-security and ~80% on security β roughly two in five non-security bugs slip through. Category match is mediocre (46β56%): the model is better at "something is wrong" than at naming the kind. Raw recall is higher on the untuned base, but base flags nearly half of all clean code (specificity 56β62%), which is why v4 is the better tool despite trading a little recall for usable specificity. Keep a human in the loop; this adapter is one component of a hybrid pipeline (static analyzers + RAG + the model).
- Downloads last month
- 14
Model tree for bartek-flp/qwen3coder-30b-dcr-lora-v4
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct