Instructions to use srivarenya/MoM-python-slm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use srivarenya/MoM-python-slm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="srivarenya/MoM-python-slm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm")
model = AutoModelForMultimodalLM.from_pretrained("srivarenya/MoM-python-slm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use srivarenya/MoM-python-slm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srivarenya/MoM-python-slm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/srivarenya/MoM-python-slm

SGLang

How to use srivarenya/MoM-python-slm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "srivarenya/MoM-python-slm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "srivarenya/MoM-python-slm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use srivarenya/MoM-python-slm with Docker Model Runner:
```
docker model run hf.co/srivarenya/MoM-python-slm
```

MoM-Python-SLM (1.5B)

The Python code-generation node of a Mixture-of-Models (MoM) mesh — a set of small, specialized Qwen2.5-Coder SLMs (shared tokenizer) coordinated by a lightweight router, aiming to beat frontier generalists on coding by specialization depth rather than parameter count.

This node is a single-turn code generator (not an agent): given a Python task (optionally with an upstream context packet), it returns reasoning followed by code. It shares the Qwen2.5-Coder tokenizer with the other generative nodes, which is what makes logit-space fusion across the mesh valid.

Base: Qwen/Qwen2.5-Coder-1.5B-Instruct
Method: DoRA r=64 (≈4.6% trainable), SFT (Phase A 1ep + Phase B 2ep), then merged.
Data: 476K instances (decontaminated vs HumanEval/MBPP, 0 overlap) built from the complete CPython docs + Flask/Requests source, issues/PRs, CVEs, and execution-verified synthetic problems.

Benchmarks (greedy pass@1)

Suite	Metric	base	this model
HumanEval	pass@1	68.9	70.7
MBPP	pass@1	66.7	69.6
Domain (held-out)	`spec_to_code` exec	0.632	0.714 (+8.2)
Domain (held-out)	`api_signature` param-recall	0.217	0.299 (+8.2)
Domain (held-out)	`problem_solving` exec	0.700	0.713 (parity)

The largest gains are on library/API capability (writing correct code from a spec, recalling API signatures) — the dimension HumanEval/MBPP are saturated on and can't measure. The repo's self-contained domain-eval notebook reproduces these.

Recipe findings (load-bearing)

Low DoRA rank wins: r=64 specializes without forgetting; r=256 catastrophically regressed (HumanEval 60.4 < base).
Moderate reasoning wins: the ~25%-reasoning recipe (this model) beat a 98%-reasoning sibling, whose HumanEval collapsed to 47 (always-reason prose fights the signature-completion format).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm")
model = AutoModelForCausalLM.from_pretrained(
    "srivarenya/MoM-python-slm", dtype="bfloat16", device_map="auto")

Prompt with the training system prompt + a Python task; the model returns reasoning then code.

Next step in the pipeline: GRPO/RLVR against an execution-grounded reward to push past the instruct-tuning ceiling. Code, training recipe, and eval harnesses: project repository.

Downloads last month: 16

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for srivarenya/MoM-python-slm

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-Coder-1.5B

Finetuned

Qwen/Qwen2.5-Coder-1.5B-Instruct

Finetuned

(180)

this model