Instructions to use badtheorylabs/btl-2-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use badtheorylabs/btl-2-coder with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-coder-7b-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "badtheorylabs/btl-2-coder") - Notebooks
- Google Colab
- Kaggle
BTL-2 Coder 7B
BTL-2 Coder 7B is a LoRA adapter for unsloth/Qwen2.5-Coder-7B-Instruct, trained for structured code-review findings.
Code and evaluation scripts are available at:
https://github.com/Badtheorylabs/btl-2-coder
Intended Use
This adapter is intended for local-first code review. It is trained to produce structured findings with:
- severity
- file path
- line number
- title
- evidence
- recommendation
- numeric confidence
The main supported issue classes are SQL injection, path traversal, authorization bypass, missing error handling, boundary/off-by-one logic, and related security/correctness findings.
The adapter is optimized for review output rather than broad chat behavior.
Training
- Base model:
unsloth/Qwen2.5-Coder-7B-Instruct - Method: LoRA SFT with Unsloth
- Data mix:
4,000API-generated review traces +1,000template traces - Train/eval split:
4,500train examples +500eval examples - Epochs:
2 - Max sequence length:
4096
Only redacted, opt-in traces should be used for future training.
Recommended Prompt Contract
Use strict schema prompting:
Return only a JSON array. No markdown and no wrapper object.
Each finding must include: severity, file, line, title, evidence, recommendation, confidence.
severity must be exactly one of: critical, high, medium, low.
Never put a category in severity.
confidence must be a number from 0 to 1, never a string label.
Every finding must include concrete evidence and a non-empty recommendation.
Example output:
[
{
"severity": "critical",
"file": "src/users.ts",
"line": 42,
"title": "SQL injection through string-built query",
"evidence": "The user id is concatenated directly into the SQL string.",
"recommendation": "Use a parameterized query.",
"confidence": 0.96
}
]
Load The Adapter
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "unsloth/Qwen2.5-Coder-7B-Instruct"
adapter = "badtheorylabs/btl-2-coder"
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
Evaluation
Measured on an NVIDIA H200 with 4-bit adapter inference.
| Eval | JSON parse | Schema valid | Numeric confidence | Category hit | File hit | Precision | Recall | Weighted severity recall |
|---|---|---|---|---|---|---|---|---|
| Heldout 100 strict | 1.000 | 0.952 | 1.000 | 0.783 | 0.840 | n/a | n/a | n/a |
| Heldout 30 strict v2 | 1.000 | 0.975 | 1.000 | 0.867 | 0.867 | n/a | n/a | n/a |
| Seeded 15 strict | 1.000 | 1.000 | 1.000 | 0.933 | 1.000 | 0.933 | 0.933 | 0.956 |
Notes:
- Heldout precision/recall is marked
n/abecause the heldout set is broader and does not use one normalized ground-truth finding per example. - The seeded benchmark is a controlled regression suite with known findings.
- Reported results use the recommended strict schema prompt.
Scope
- Primary task: structured security and correctness review.
- Output format: JSON findings with severity, location, evidence, recommendation, and confidence.
- Best runtime path: strict schema prompting, with optional constrained decoding.
- Evaluation focus: code-review findings, file hits, schema validity, and seeded precision/recall.
- Next track: patch proposals and terminal workflows.
Files
This repository contains a PEFT/LoRA adapter:
adapter_model.safetensorsadapter_config.jsontokenizer.jsontokenizer_config.jsonchat_template.jinjatraining_args.binSHA256SUMS
- Downloads last month
- 24
Model tree for badtheorylabs/btl-2-coder
Base model
Qwen/Qwen2.5-7B