Python SLM 1.5B — v4 (r=64, 98%-reasoning) — experimental, see results

A Python-specialized fine-tune of Qwen2.5-Coder-1.5B-Instruct, a node in the Mixture-of-Models (MoM) mesh. Single-turn code generator (not an agent).

Use srivarenya/python-slm-v3 instead for general use — it is the V1 keeper. v4 tested whether more reasoning coverage helps; it does not.

Base: Qwen/Qwen2.5-Coder-1.5B-Instruct
Method: DoRA r=64 (4.6% trainable), SFT on reasoning-augmented Python data — 98% of solving records carry a reasoning trace (problem → reasoning → code), vs 25% for v3.

Results (greedy pass@1, A100)

Bench	v4	v3	base
HumanEval	47.0%	70.7%	68.9%
MBPP	66.9%	69.6%	66.7%

Finding: 98% reasoning over-cooks it. MBPP (write a full function from a spec) is normal (66.9 ≈ base), but HumanEval (complete a given signature) collapses to 47.0 — a format-specific failure, not a loss of coding ability: the always-reason habit emits prose before code, which fights the signature-completion format. 25% reasoning (v3) is the sweet spot.

Usage

Prompt with the training system prompt + a problem; the model returns reasoning then code.

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("srivarenya/python-slm-v4")
model = AutoModelForCausalLM.from_pretrained("srivarenya/python-slm-v4", dtype="bfloat16", device_map="auto")

Code, recipe, and eval harness: https://github.com/srivarenya01/python-slm

Downloads last month: 17

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for srivarenya/python-slm-v4

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-Coder-1.5B

Finetuned

Qwen/Qwen2.5-Coder-1.5B-Instruct

Finetuned

(181)

this model