Python SLM 1.5B โ€” v4 (r=64, 98%-reasoning) โ€” experimental, see results

A Python-specialized fine-tune of Qwen2.5-Coder-1.5B-Instruct, a node in the Mixture-of-Models (MoM) mesh. Single-turn code generator (not an agent).

Use srivarenya/python-slm-v3 instead for general use โ€” it is the V1 keeper. v4 tested whether more reasoning coverage helps; it does not.

  • Base: Qwen/Qwen2.5-Coder-1.5B-Instruct
  • Method: DoRA r=64 (4.6% trainable), SFT on reasoning-augmented Python data โ€” 98% of solving records carry a reasoning trace (problem โ†’ reasoning โ†’ code), vs 25% for v3.

Results (greedy pass@1, A100)

Bench v4 v3 base
HumanEval 47.0% 70.7% 68.9%
MBPP 66.9% 69.6% 66.7%

Finding: 98% reasoning over-cooks it. MBPP (write a full function from a spec) is normal (66.9 โ‰ˆ base), but HumanEval (complete a given signature) collapses to 47.0 โ€” a format-specific failure, not a loss of coding ability: the always-reason habit emits prose before code, which fights the signature-completion format. 25% reasoning (v3) is the sweet spot.

Usage

Prompt with the training system prompt + a problem; the model returns reasoning then code.

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("srivarenya/python-slm-v4")
model = AutoModelForCausalLM.from_pretrained("srivarenya/python-slm-v4", dtype="bfloat16", device_map="auto")

Code, recipe, and eval harness: https://github.com/srivarenya01/python-slm

Downloads last month
17
Safetensors
Model size
2B params
Tensor type
F32
ยท
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for srivarenya/python-slm-v4

Finetuned
(181)
this model