Qwen2.5-Coder-7B Agentic SLM v5 LoRA

This repository contains the PEFT LoRA adapter used in the v5 agentic coding system.

Base model: Qwen/Qwen2.5-Coder-7B-Instruct

The important result is not a raw first-answer-only model win. The v5 artifact is a small coding-agent system:

  1. Run the 7B adapter through a strict code-only generation harness.
  2. Execute tests/verifiers.
  3. Select the shortest passing candidate.
  4. On misses, invoke a stronger rescue model only for the failed tasks.
  5. Verify again and report task-level results.

This is the correct interpretation of the release: the LoRA is one component of the system. The strongest score below comes from the full verifier-rescue pipeline.

Current Proof Gate

Kaggle proof kernel: holykeys/qwen25-coder-agentic-slm-v5-rescue

Evaluation set: 50 HumanEval/MBPP-style coding tasks used for fast iteration.

Phase Greedy pass@1 Coverage@K Selected@K Repair Final
Qwen2.5-Coder-7B reference harness 37/50 40/50 40/50 2/50 42/50
v5 7B adapter primary 37/50 42/50 42/50 2/50 44/50
14B rescue on primary misses 1/6 3/6 3/6 1/6 4/6
v5 combined rescue system 38/50 45/50 45/50 3/50 48/50

Lift

Against the Qwen2.5-Coder-7B reference harness result of 42/50:

  • LoRA-only primary system: 44/50, a +2/50 absolute improvement.
  • LoRA-only percentage-point lift: +4 points.
  • LoRA-only relative lift: +4.76%.
  • Full v5 rescue system: 48/50, a +6/50 absolute improvement.
  • Full system percentage-point lift: +12 points.
  • Full system relative lift: +14.29%.
  • Failure reduction: from 8 misses to 2 misses, a 75% reduction in failures on this gate.

The honest conclusion: the LoRA alone is a small gain. The meaningful progress is from the deterministic verifier/rescue system.

What This Is Not

This is not a claimed Claude Sonnet 4.5 replacement.

This is not a broad SWE-bench win.

This is not a proof that the raw 7B weights beat frontier models.

The release is a reproducible intermediate artifact: a compact coding model plus a verifier-oriented harness that shows a measurable improvement on a fast gate.

Required Next Benchmarks

The current gate is intentionally small. It is useful for fast iteration only. Before making larger claims, the next evaluation batch must include:

  • LiveCodeBench: fresh contest-style coding problems, preferably recent slices only.
  • BigCodeBench: broader function-level and library-use coding tasks.
  • SWE-bench Lite or Verified subset: repository patching with real tests.
  • Agentic edit tasks: file editing, test execution, patch generation, and repair loops.
  • Cost and latency: wall-clock time, tokens generated, GPU class, and estimated dollar cost.
  • Abstention rate: how often the system refuses to answer or returns no valid patch.
  • Invalid-output rate: markdown leakage, missing entrypoint, syntax errors, test leakage, and prose leakage.
  • Selector diagnostics: coverage@K, selected@K, selector gap, repair@1, and false-positive verifier selections.

Recommended Evaluation Policy

Do not push all training/eval/release work inside one notebook.

Use deterministic batches:

  1. Baseline batch: run the base model first, no training.
  2. Candidate batch: run the candidate model/harness on the exact same tasks.
  3. Failure batch: collect failed tasks, failed code, verifier output, and minimal repair.
  4. Repair batch: train or prompt only on verified repair data.
  5. Proof batch: rerun held-out tests immediately.
  6. Release batch: publish only if the proof gate beats the previous best.

Every batch should emit JSON summaries, task-level CSV, rollouts, error signatures, and environment metadata.

Files

  • adapter_model.safetensors: LoRA adapter.
  • adapter_config.json: PEFT configuration.
  • v5_rescue_release_summary.json: exact proof-run summary.
  • v5_rescue_eval_before_after_full_code.csv: task-level proof-run table.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_id = "josephmayo/Qwen2.5-Coder-7B-agentic-SLM-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, adapter_id)

For best results, use the model inside a strict code-only verifier harness. Do not evaluate it only by casual chat prompts.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for josephmayo/Qwen2.5-agentic-7B-SLM-LoRA

Base model

Qwen/Qwen2.5-7B
Adapter
(687)
this model

Collection including josephmayo/Qwen2.5-agentic-7B-SLM-LoRA