Instructions to use josephmayo/Qwen2.5-agentic-7B-SLM-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use josephmayo/Qwen2.5-agentic-7B-SLM-LoRA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct") model = PeftModel.from_pretrained(base_model, "josephmayo/Qwen2.5-agentic-7B-SLM-LoRA") - Notebooks
- Google Colab
- Kaggle
Qwen2.5-Coder-7B Agentic SLM v5 LoRA
This repository contains the PEFT LoRA adapter used in the v5 agentic coding system.
Base model: Qwen/Qwen2.5-Coder-7B-Instruct
The important result is not a raw first-answer-only model win. The v5 artifact is a small coding-agent system:
- Run the 7B adapter through a strict code-only generation harness.
- Execute tests/verifiers.
- Select the shortest passing candidate.
- On misses, invoke a stronger rescue model only for the failed tasks.
- Verify again and report task-level results.
This is the correct interpretation of the release: the LoRA is one component of the system. The strongest score below comes from the full verifier-rescue pipeline.
Current Proof Gate
Kaggle proof kernel: holykeys/qwen25-coder-agentic-slm-v5-rescue
Evaluation set: 50 HumanEval/MBPP-style coding tasks used for fast iteration.
| Phase | Greedy pass@1 | Coverage@K | Selected@K | Repair | Final |
|---|---|---|---|---|---|
| Qwen2.5-Coder-7B reference harness | 37/50 | 40/50 | 40/50 | 2/50 | 42/50 |
| v5 7B adapter primary | 37/50 | 42/50 | 42/50 | 2/50 | 44/50 |
| 14B rescue on primary misses | 1/6 | 3/6 | 3/6 | 1/6 | 4/6 |
| v5 combined rescue system | 38/50 | 45/50 | 45/50 | 3/50 | 48/50 |
Lift
Against the Qwen2.5-Coder-7B reference harness result of 42/50:
- LoRA-only primary system:
44/50, a+2/50absolute improvement. - LoRA-only percentage-point lift:
+4 points. - LoRA-only relative lift:
+4.76%. - Full v5 rescue system:
48/50, a+6/50absolute improvement. - Full system percentage-point lift:
+12 points. - Full system relative lift:
+14.29%. - Failure reduction: from
8misses to2misses, a75%reduction in failures on this gate.
The honest conclusion: the LoRA alone is a small gain. The meaningful progress is from the deterministic verifier/rescue system.
What This Is Not
This is not a claimed Claude Sonnet 4.5 replacement.
This is not a broad SWE-bench win.
This is not a proof that the raw 7B weights beat frontier models.
The release is a reproducible intermediate artifact: a compact coding model plus a verifier-oriented harness that shows a measurable improvement on a fast gate.
Required Next Benchmarks
The current gate is intentionally small. It is useful for fast iteration only. Before making larger claims, the next evaluation batch must include:
- LiveCodeBench: fresh contest-style coding problems, preferably recent slices only.
- BigCodeBench: broader function-level and library-use coding tasks.
- SWE-bench Lite or Verified subset: repository patching with real tests.
- Agentic edit tasks: file editing, test execution, patch generation, and repair loops.
- Cost and latency: wall-clock time, tokens generated, GPU class, and estimated dollar cost.
- Abstention rate: how often the system refuses to answer or returns no valid patch.
- Invalid-output rate: markdown leakage, missing entrypoint, syntax errors, test leakage, and prose leakage.
- Selector diagnostics: coverage@K, selected@K, selector gap, repair@1, and false-positive verifier selections.
Recommended Evaluation Policy
Do not push all training/eval/release work inside one notebook.
Use deterministic batches:
- Baseline batch: run the base model first, no training.
- Candidate batch: run the candidate model/harness on the exact same tasks.
- Failure batch: collect failed tasks, failed code, verifier output, and minimal repair.
- Repair batch: train or prompt only on verified repair data.
- Proof batch: rerun held-out tests immediately.
- Release batch: publish only if the proof gate beats the previous best.
Every batch should emit JSON summaries, task-level CSV, rollouts, error signatures, and environment metadata.
Files
adapter_model.safetensors: LoRA adapter.adapter_config.json: PEFT configuration.v5_rescue_release_summary.json: exact proof-run summary.v5_rescue_eval_before_after_full_code.csv: task-level proof-run table.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_id = "josephmayo/Qwen2.5-Coder-7B-agentic-SLM-LoRA"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, adapter_id)
For best results, use the model inside a strict code-only verifier harness. Do not evaluate it only by casual chat prompts.
- Downloads last month
- 10