Python SLM 1.5B โ v4 (r=64, 98%-reasoning) โ experimental, see results
A Python-specialized fine-tune of Qwen2.5-Coder-1.5B-Instruct, a node in the Mixture-of-Models (MoM) mesh. Single-turn code generator (not an agent).
Use
srivarenya/python-slm-v3instead for general use โ it is the V1 keeper. v4 tested whether more reasoning coverage helps; it does not.
- Base: Qwen/Qwen2.5-Coder-1.5B-Instruct
- Method: DoRA r=64 (4.6% trainable), SFT on reasoning-augmented Python data โ 98% of solving records carry a reasoning trace (problem โ reasoning โ code), vs 25% for v3.
Results (greedy pass@1, A100)
| Bench | v4 | v3 | base |
|---|---|---|---|
| HumanEval | 47.0% | 70.7% | 68.9% |
| MBPP | 66.9% | 69.6% | 66.7% |
Finding: 98% reasoning over-cooks it. MBPP (write a full function from a spec) is normal (66.9 โ base), but HumanEval (complete a given signature) collapses to 47.0 โ a format-specific failure, not a loss of coding ability: the always-reason habit emits prose before code, which fights the signature-completion format. 25% reasoning (v3) is the sweet spot.
Usage
Prompt with the training system prompt + a problem; the model returns reasoning then code.
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("srivarenya/python-slm-v4")
model = AutoModelForCausalLM.from_pretrained("srivarenya/python-slm-v4", dtype="bfloat16", device_map="auto")
Code, recipe, and eval harness: https://github.com/srivarenya01/python-slm
- Downloads last month
- 17
Model tree for srivarenya/python-slm-v4
Base model
Qwen/Qwen2.5-1.5B